Skip to main content

Featured Post

Ed Scheidts Mayan Symbols - Can we solve the puzzle?

In this post I want to talk about a thing from the Kryptos universe that are not directly related to the statue. But i think it may be an indirect hint to some Kryptos related methods. The Mayan Symbols in Ed Scheidts driveway I think everyone who knows Kryptos knows Ed Scheidt. The former Chairman of the Cryptographic Center at the CIA and founder of the cryptosystems used around the Kryptos statue. As already shown in Part 4 of my Kryptos series, in the driveway of Ed Scheidts house, there are two symbols: Figure 1 - Garage driveway of Ed Scheidt We denote the left symbol set with $S_1$ and the right one with $S_2$. It took me a while to find his house on Google Maps - Street View. To save you some time, here is the link with a view on the driveway. I you go back in time in Streetview, you can see that the symbols were already there in 2012. But it is impossible to say when they were built. $S_1$ is clearly visible from the street, $S_2$ is hidden in the view. But you can u...

The Dorabella Cipher (Part 5) - Is it a hoax?

Random vs Structure
There are some arguments that the Dorabella Cipher is actually not a real cipher, which are based on the existence of long strings of consecutive symbols with pair-wise different number of semi-circles. Using theory and practise i will show that this is actually not as suspicious as it may seem but it indeed maybe a hint that something is fishy.

There’s an argument that the Dorabella Cipher may not be a real cipher at all. The main reason given is that it contains long stretches where the number of semicircles alternates: adjacent symbols never have the same semicircle-count. In this post I’ll test whether that pattern is actually as suspicious as it sounds.The Dorabella Cipher has remained unsolved for about 130 years. While looking for recent discussion and solution attempts, I went well beyond the first page of Google results and found several excellent deep-dive blogs. One post on aplaceofbrightness.blogspot.com discusses the “alternating semicircle-count” pattern and motivated this analysis.

Dorabella Cipher
The Dorabella Cipher [87 Symbols]

The findings on the blog are presented below. They adress some strange distribution of chosen cipher symbols. Look at the frist line of the Dorabella Cipher. It starts with the following symbols:

      Line 1 - First 13 symbols
      #2cE##3cW##2cSE##3cE##1cE##2cS##1cN##3cE##1cSW##2cNE##3cSE##2cNW##1cNW#

Do you notice anything? - Two consecutive letters never have the same number of semicircles! We have [2,3,2,3,1,2,1,3,1,2,3,2,1]. This are thirteen symbols in a row!

      Line 2 - First 10 symbols
      #1cN##2cNW##1cN##2cS##1cN##3cE##1cSW##2cSW##3cE##2cSE#

Here there are 10 symbols without two consecutive ones with the same number of semicircle.

      Line 3 - 4th to 12th symbols
      #2cN##3cNW##2cSE##1cSE##2cN##3cN##1cS##3cNW##2cSE#

Here we get nine.

The author from the blog argues that this is a strong clue that the cipher does not resemble normal english.

But is this really exceptional or is such a pattern acutally more likely than it seems?

First, since we use this terminology over and over again, let us define what we exactly mean with "such a pattern".

Definition Property-C
Given a string $S$ of length 87 over $\{1,2,3\}$, we say $S$ has Property-C if it contains a substring of length at least 13 in which adjacent symbols are never equal.

The argument is, Dorabella’s first line has an unusually long run with no equal consecutive semicircle-counts. To answer the question, if Property-C is very unlikely, i made three test:

  1. [AI Theoretical Answer/Random Model] I asked the AI about the probability that a random string $S$ of length 87 has Property-C.

  2. [Monte-Carlo/Random Model] I computed random strings of length 87 with symbols from three categories and tested how many have Property-C.

  3. [Monte-Carlo/Real-text driven] I encrypted actual language with random cipher-symbol/alphabet assignments and tested how many 87 letter long ciphertext have Property-C.

Theory/Random Model

I simply asked the AI (OpenAI v5.2) the prompt:

> Suppose you have string of length 87, that consist of integers from {1,2,3} randomly selected. What is the probability that there is a substring of length 13 that has no two equal consecutive integers?

The answer was a reasoning using Bernoulli arguments and dynamic programming with the result \[ \textsf{Pr}(\text{String S has Property-C}) \approx 0.18450333017 \] I red the result. I was very lengthly (as usual :)) but looks ok.

In other words, for a string of length 87 consisting of random symbols from a set of size 24, where each symbol falls into one of three categories of size 8, the probability that there is a substring of length 13 such that each symbol's category differs from that of its neighbours is ~18.45%.

Monte-Carlo/Random Model

To perform some practical tests, I generated a text of length 87 from a set S consisting of 24 symbols. [I could use only three different integers to save the step of substituting the integer with its category, but for the result it doesnt matter] For simplicity (but no loss of generality) i chose the numbers from $S = [0,23]$. At each position of the text i picked a random symbol from $S$. I divided the set $S$ into three categories, analogous to the three different number of semicircles from the Dorabella Cipher: $C_1 = [0,7], C_2 = [8,15], C_3 = [16,23]$. Then i checked if the generated integer string has Property-C. I repeated this 20.000 times. Below you can find the code.

  
    from random import randint
    from math import floor

    def getCat(i):
        return i // 8  

    total_ctr = 0
    B = 20000  # use more trials for stability

    for k in range(B):
        TXT = [randint(0, 23) for _ in range(87)]

        ctr = 1
        max_ctr = 1

        for j in range(len(TXT) - 1):
            if getCat(TXT[j]) != getCat(TXT[j+1]):
                ctr += 1
            else:
                if ctr > max_ctr:
                    max_ctr = ctr
                ctr = 1

        if ctr > max_ctr:
            max_ctr = ctr

        if max_ctr >= 13:
            total_ctr += 1

    print(f"> {numerical_approx(total_ctr / B, digits = 5) * 100}%")
  
  

The result vary a little bit (increase B for better stability) but they are very close to the theoretical computation of 18.45%, which gets a good backup from this result.

Monte-Carlo/Real-text driven

So far in the random model, a run of thirteen such symbols are not very uncommon. However, english language is far from being random, if you look at the letter distribution. So, the two previous results maybe be off when comparing it with a real language test. What i did next is to assign a random monoalphabetic substitution for each symbol. Then i encrypted the first 87 letters from Shakespeare Sonnets and checked for the given property.

  
	from cryptanalysis import etsch_helper_functions as helper

    def getCat(i):
        return i // 8

    A = list("ABCDEFGHIKLMNOPQRSTUWXYZ")  # 24 letters, no J, no V
    rp = helper.get_random_permutation(list(range(24)))
    Ar = helper.apply_permutation(A, rp)   # permuted alphabet

    # fast lookup: letter -> symbol index 0..23
    pos = {Ar[idx]: idx for idx in range(24)}

    with open("Shakespeare-Sonnets.txt", "r", encoding="utf-8") as f:
        text = f.read()

    letters = [c.upper() for c in text if c.isalpha()]
    letters = [('I' if c == 'J' else 'U' if c == 'V' else c) for c in letters]

    N = len(letters)
    num_windows = N - 87 + 1
    total_ctr = 0

    for i in range(num_windows):
        TXT = [pos[letters[i+j]] for j in range(87)]

        ctr = 1
        max_ctr = 1
        for j in range(86):
            if getCat(TXT[j]) != getCat(TXT[j+1]):
                ctr += 1
            else:
                if ctr > max_ctr:
                    max_ctr = ctr
                ctr = 1
        if ctr > max_ctr:
            max_ctr = ctr

        if max_ctr >= 13:
            total_ctr += 1

    print(f"{total_ctr} / {num_windows} = {float(100*total_ctr/num_windows):.5f}%")
  

The result of this test does depend heavily on the random permutation that assign the alphabet letters ABC.. to the symbols. Since the number of Edgars cipher symbols are only 24, we substituted each 'J' with an 'I' and each 'V' with and 'U' which is no uncommon for ciphers of this kind. The text we encrypted are the Sonnets of Shakespeare. We picked a 87 length substring, encrypted it and tested if the cipertext has Property-C. The result was, that somewhere between $$ \approx \left[25\%, 45\%\right] $$ of 87 long encrypted substring of Shakespeare's Sonnets have the Property-C.

Conclusion

What about the argument that there are three of this substrings? One with length 13 with a probability of 18.45%. The next one of length 10 with a probabilty of 45.43% [here i assumed a total length of 87-13] and a third with length 9 and probability 57.14% [here i assumed a total length of 87-13-10]. They have no overlapping. If we assume independence we can just multiply the probabilities and get very rough estimate of $$ 4.7\% $$

We didnt even take into account the positioning of these substring. To roughly account for the fact that one run ‘uses up’ positions, I recompute the probabilities on shortened strings (87−13, 87−13−10). This still ignores dependencies and boundary effects, so treat the result as a coarse estimate.
Is this too low? Maybe - Maybe not.

So whats this all about? First i want to mentioned that the author of aplaceofbrightness also mentions another odditiy. After the long substrings with Property-C, in the second half of the three lines, more or less the opposite happens. There are far too many symbol-mirrorpairs. A symbol mirror is for example

    	#3cE##3cW#  or  #1cSE##1cNW#
    

Yes, this maybe a hint, but perhaps it is a consecense that the Property-C substring stops. Per definition now there are more consecutive pairs that share the same number of semicircles. And mirror pairs fall into that category. Also the reasoning that Edgar tried to be "random" in the first symbols of each line thereby doing too much. Then falls back into doing more repeating sequences thereby doing also too much by inserting too many mirror symbols, is quite reasonable. Therefore i am torn between "yes" that might be a hoax or "no" such occurences are not too improbable. What do you think?

[1] https://scienceblogs.de/klausis-krypto-kolumne/2021/02/26/ist-das-dorabella-kryptogramm-eine-faelschung/
[2] https://aplaceofbrightness.blogspot.com/p/introduction-in-1897-british-composer.html

Comments

Popular posts from this blog

Kryptos - The Cipher (Part 4) - Correctly positioned decryption of the word BERLIN

EASTNORTHEAST - This is not exactly the hint Jim Sanborn (JS) gave for K4 on the 29th of January this year. He only gave NORTHEAST - which refers to the positions 26-34 of K4's plaintext.  Beside BERLIN and CLOCK it is the third revealed plaintext word of K4. However, also this hint does not seem to help much.  However, it just so happened, that a member in the yahoo kryptos group had a conversation with Jim Sanborn due to a submitted solution. Sandborn's answer to the question contained again the last clue which surprisingly was EASTNORTHEAST at position 22-34. Jim Sanborns compass rose at CIA There is disagreement if Jim revealed this on purpose or he did it accidentially, but the new extended clue seem to be serious and valid.Interestingly, EASTNORTHEAST is exactly the direction which is illustrated on the compass rose on one of the stones around kryptos, also created by Jim Sanborn. Actually, i dont really kn...

Kryptos - The Cipher (Part 1) - Introduction

Introduction. Since I think that KRYPTOS does not need any introduction, I will only give you a brief description of one of the most famous and only partially solved ciphers known today: KRYPTOS - Von Jim Sanborn - Jim Sanborn, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8253447 KRYPTOS was constructed in Nov. 1990 on the ground of the CIA Headquarter in Langley, Virginia by Jim Sanborn It contains 4 ciphers (K1,K2,K3,K4) on its left side and some kind of Vigenère-Table on its right side K1, K2 and K3 were solved by James Gillogly in 1999. Afterwards, the CIA and later the NSA claimed that they had a solution to the first three ciphers at an earlier point in time Ed Scheidt, a cryptoanalyst and former director of the CIA, gave Sanborn the input of possible cryptographic techniques to use K1 is a variant of the Vigenère-Cipher (Quagmire 3) with the codewords KRYPTOS and PALIMPSES...

Kryptos - The Cipher (Part 3)

This post is about is more or less a collection of several approaches and facts that has been said as well as some speculations. B-ary integer representation According to [1] during a Question and Answer round, Jim Sanborn was asked again about the hint BERLIN. The question was if N decodes to B, Y decodes to E, etc, etc. and Jim confirmed it does. Emphatically . It is written, that Jim Sanborn rattled through the entire crib: \begin{align}   \texttt{N} &\stackrel{\text{decode}}{\rightarrow} \texttt{B} \\   \texttt{Y} &\stackrel{\text{decode}}{\rightarrow}  \texttt{E} \\   \texttt{P} &\stackrel{\text{decode}}{\rightarrow}  \texttt{R} \\   \texttt{V} &\stackrel{\text{decode}}{\rightarrow}  \texttt{L} \\   \texttt{T} &\stackrel{\text{decode}}{\rightarrow}  \texttt{I} \\   \texttt{T} &\stackrel{\text{decode}}{\rightarrow}  \texttt{N} \end{align} When the same q...