De Bruijn Sequences and the Discrete Logarithm Problem

This blog post is about De Bruijn sequences and its relationship to cryptography. We focus here on De Bruijn sequences over the alphabet $A=\{0,1\}$.

Definition [De Bruijn sequence (binary)]. A binary string of length $2^n$ is called a De Bruijn sequence of order $n$ if each possible $n$-bit string occurs exactly once (incl. wrap-around) as a substring.

Picture 1 - DeBruijn graph for n=3

De Bruijn sequences are the shortest possible bitstring that contains all possible $n$-bit substrings. For each $n$ there are $$\frac{2^{2^{n-1}}}{2^n}$$ possible De Bruijn sequences. For example, a De Bruijn sequence of order $n=3$ is $$00011101$$ and of order $n=4$ is $$0000101100111101$$ There are several ways to create De Bruijn sequences. One way is walk along an Hamiltonian Cycle in the corresponding De Bruijn graph (see Picture 1). A De Bruijn graph is a directed graph of degree $2$ which has $2^n$ vertices. There is an edge from vertex $i$ to the two vertices $$(2i\;\text{mod}\:2^{n-1})\;\text{and}\;(2i\;\text{mod}\;2^{n-1})+1$$ The example sequence above for $n=3$ can be obtained by visiting the vertices in the graph from Picture 1 in the following order: $0,1,3,7,6,5,2,4,(0)$. Another way to create De Bruijn sequences is to use a Linear Feedback Shift Register (LFSR), loaded with a primitive polynomial of degree $n$ to create a $m$-sequence. $m$-sequences are De Bruijn sequences that miss the substring $0_n$, i.e., the substring that consists of $n$ zeros (sometimes also called a punctured De Bruijn sequence). There is also a relationship between discrete logarithms in GF$(2^n)$ and $m$-sequences which however is not the relationship i want to cover here. A good point to start reading about this topic is the book Algebraic Shift Registers Sequences from Goresky and Klapper [1].

Based on the definition of De Bruijn sequences it follows that each $n$-bit string is unique. Hence associated with De Bruijn sequences there is the following problem:

Definition [Decoding Problem]. Given a De Bruijn sequence of order $n$, called $S_n$, and a unique $n$-bit substring $s$, find the position of $s$ in $S_n$.

For some special structured De Bruijn sequences the Decoding Problem is efficiently solvable [2]. As mentioned above, $m$-sequences are De Bruijn sequence, that miss the all zero substring $0_n$ and are associated with LFSR which again are associated with discrete logarithms over GF($2^n$). Hence, one can ask the following question:

Q: Take the De Bruijn graph, e.g., from Picture 1, and do not walk along a Hamiltonian Cycle, but a shorter cycle. Then by definition of the De Bruijn graph one gets a sequence that also contains only unique $n$-bit substrings but not all of them. Is there something special about those sequences?

For example, one can walk along the graph from Picture 1 by forming the cycle $1,3,6,4,(1)$ which corresponds to the sequence $$0011$$ and has the corresponding unique substrings $$001,011,110,100$$ There are of course many such cycles in a given graph. And indeed, some of them are special and are called $l$-sequences. They have a special relationship to the (partial) Fermat quotient $q_p(2)$.

Definition [(Partial) Fermat Quotient]. Given a prime number $p$ and let $h$ be the multiplicative order of $2$ in $\mathbb{F}^*_p$, then $$q_p(2) = \frac{2^h-1}{p}$$

If $h=p-1$, i.e.,$2$ is a primitive root in $\mathbb{F}^*_p$, $q_p(2)$ is the usual Fermat Quotient otherwise it is the partial Fermat Quotient. The relationship is, that the binary representation of $q_p(2)$ is an $l$-sequence. The proof is actually not that complicated and the uniqueness of each $n$-bit substring comes from the fact, that each group in $\mathbb{F}^*_p$ contains only unique group elements and i used this relationship also in the paper [3] to link group elements to $p$-adic digit sums. Because some $l$-sequences are equal to a (partial) Fermat quotient gives those sequences a small linear complexity, i.e., you can write a very short "shortcut" for the $l$-sequence (namely $q_p(2)$) rather than writing all its digits one by one.

If $\mathbb{G}_2$ is the subgroup of $\mathbb{F}^*_p$ generated by $2$ and let $[n]$ be the set of all $n$-bit substring of the $l$-sequence $q_p(2)$, then we have an effiently computable function $$M_p: \mathbb{G}_2 \rightarrow [n] $$ with $$M_p: r \in \mathbb{G}_2 \rightarrow \text{Binary}(-rp^{-1}\;\text{mod}\;2^n)$$ for which also the inverse function is efficiently computable.

Theorem. The decoding problem of $l$-sequences of order $n$ is equal to solve the discrete logarithm in $\mathbb{F}^*_p$ to base $g=2$ for an $n$-bit prime number $p$.

Proof. The proof of the theorem is very short. Given a discrete logarithm instance $(p,r,g)$ (i.e. $g^x \equiv r\pmod{p}$) and given the $l$-sequence $L$ obtained from the binary representation of $(2^h-1)/p$, then $L$ always starts with $0_{n-1}1$. Then compute $t \leftarrow M_p(r)$ and solve the Decoding Problem by finding the position of $t$ in $L$ and the solution is $t+n$.■

Example: $p=29$ then it is $$q_{29}(2) = \frac{2^{28}-1}{29}= 9.256.395$$ Then the binary representation of $q_{29}(2)$ is (prepended with zeros to reach a length of $h$) :$$ (*)\;\;\;0000100011010011110111001011$$ It is an order $5$ $l$-sequence of length $28$. It only misses the four substrings $$00000, 01010, 10101, 11111$$ Suppose the given discrete logarithm instance is $(29,21,2)$, i.e., $2^x \equiv 21\pmod{29}$ (The secret is $x=17$). Then $$M_{29}(21) \equiv -21\cdot{29}^{-1} \pmod{2^5} \equiv 7 = 00111_2$$ For $(29,19,2)$ (solution $x=9$) it is $$M_{29}(19) \equiv -19\cdot{29}^{-1} \pmod{2^5} \equiv 17 = 10001_2$$ As you can see from $(*)$, in the first case, the Decoding algorithm returns $12$ (starting from position $0$) and $12+5=17$ and in the second case, the Decoding algorithm returns $4$ and $4+5=9$ which are both the correct solutions.

Lempel's Homomorphism

There is a way to step from a cycle in a De Bruijn graph of order $n$ to a cycle in a De Bruin graph of order $n-1$. It is called Lempel's Homomorphism, see [3]. It is not restricted to De Bruijn sequences and hence can be applied even to short cycles, e.g., $l$-sequences.

Definition [Lempel's Homomorphism]. Let $S_n = b_0b_1b_2\ldots b_{n-1}$ and $S_{n-1} = \beta_0\beta_1\beta_2\ldots \beta_{n-2}$ two binary string of length $n$ and $n-1$ respectively. Then $$\text{D}: \{0,1\}^n \rightarrow \{0,1\}^{n-1}$$ with $$\text{D}(S_n) = S_{n-1}\;\text{with}\;\beta_i = b_i \oplus b_{i+1},\;\;i = 1,\ldots, n-1$$

We need some definitions before we can apply Lempel's Homomorphism. For a bit $b_i$ let $\overline{b}_i$ the inverse, i.e., $b_i = 0$, then $\overline{b}_i = 1$ and vice versa. If $S_n$ is a bitstring then $\overline{S}_n$ means that each bit is inverted. $S_n$ and $\overline{S}_n$ are said to be dual to each other.

Primitive. A bit string $S_n$ (the subscript $n$ refers to the bit length of unique substrings) is primitive if $S_n$ contains the $n$-bit substring $s$ but not the substring $\overline{s}$.
Self dual. A bit string $S_n$ is self dual if $\overline{S}_n$ is a cyclic shift $S_n$. If $S_n$ is self dual its length is always even.

Note that self dual also implies, that for each $n$-bit substring $s_i$ of $S_n$, $\overline{s}_i$ is also a substring of $S_n$.

Then Lempel stated the following Theorems:

Theorem A. Given a bitstring $S_n$ that represents a cycle of length $k$ in a De Bruijn graph of order $n$, then $\text{D}(S_n)$ is a bitstring $S_{n-1}$ that represents a cycle of length $k$ in a De Bruijn graph of order $n-1$.

Theorem A states that if you have a cycle in a De Bruijn graph of order $n$ you can find a cycle of the same length in a De Bruijn graph of order $n-1$. Note that this implies that the cycle length must be less or equal to $2^{n-1}$ since this is the longest possible cycle in a De Bruijn graph of order $n-1$.

Theorem B. Given a bitstring $S_n$ that is self dual and represents a cycle of length $2m$ in a De Bruijn graph of order $n$, then $\text{D}(S_n)$ is a cycle of length $m$ in the De Bruijn graph of order $n-1$.

Theorem B states a way to step down to a De Bruijn graph of smaller order and concurrently halve the length of the cycle. The relationship between self dual sequences and $q_p(2)$ is that self dual implies that the group element $-1$ is element of the group generated by $2$ in $\mathbb{F}^*_p$.

If one applies this two theorems to $l$-sequences that come from the binary representation of $q_p(2)$, can they be used to gain some knowledge about a given discrete logarithm problem in $\mathbb{F}^*_p$? It seems that the linear complexity of a given sequence increases exponentially when $\text{D}$ is applied, but are there some exceptions?

[1] Goresky and Klapper, Algebraic Shift Register Sequences, Cambridge University Press, 498 pages
[2] Chris J. Mitchell, Tuvi Etzion and Kenneth G. Paterson, A Method for Constructing Decodable de Bruijn Sequences, IEEE Transactions on Information Theory, Vol. 42, No. 5, 1996
[3] On a Homomorphism of the de Bruijn Graph and Its Applications to the Design of Feedback Shift Registers, Abraham Lempel, IEEE Transactions on Computers, Vol. C-19, No. 12, 1970
[4] Christian Schridde, Computing discrete logarithms using $\mathcal{O}((log\;q)^2)$ operations from $\{+, -, \times,\div\, \&\}$, Groups Complexity Cryptology, Vol. 8, No. 2, pp. 91-107, 2016

Kryptos - The Cipher (Part 4) - Correctly positioned decryption of the word BERLIN

EASTNORTHEAST - This is not exactly the hint Jim Sanborn (JS) gave for K4 on the 29th of January this year. He only gave NORTHEAST - which refers to the positions 26-34 of K4's plaintext. Beside BERLIN and CLOCK it is the third revealed plaintext word of K4. However, also this hint does not seem to help much. However, it just so happened, that a member in the yahoo kryptos group had a conversation with Jim Sanborn due to a submitted solution. Sandborn's answer to the question contained again the last clue which surprisingly was EASTNORTHEAST at position 22-34. Jim Sanborns compass rose at CIA There is disagreement if Jim revealed this on purpose or he did it accidentially, but the new extended clue seem to be serious and valid.Interestingly, EASTNORTHEAST is exactly the direction which is illustrated on the compass rose on one of the stones around kryptos, also created by Jim Sanborn. Actually, i dont really kn...

NumberWorld

Search This Blog

Featured Post

Backdooring Cryptography - Two characters that break your SSL encryption

De Bruijn Sequences and the Discrete Logarithm Problem

Labels

Comments

Post a Comment

Popular posts from this blog

Kryptos - The Cipher (Part 4) - Correctly positioned decryption of the word BERLIN

Kryptos - The Cipher (Part 1) - Introduction

Kryptos - The Cipher (Part 3) - K4 Intentional vs. non-intentional errors