Learning a Sine-Wave to break Cryptography

A few month ago, i locked up myself into a mental hole, by trying to solve or at least partially solve the following a problem. I named it "Learning a Sine Wave", because you can think of it, as finding the period of a sine function given some hints.

Problem-Definition. Assume you have a sufficient long integer interval $I=[0,n]$ and a set $Z$ that consists of tuples $(x,s(x))$, with $x \in_R I$ and $s(x) = \mathsf{sign}(\sin(2x\pi/p))$. The set $Z$ contains the randomly distributed integers from $I$ enriched with the information if a certain (but fixed) sinus wave travels above or below that point (i.e., the sign value). Find the secret parameter $p$.

Below you find a illustration of the problem. I sine wave is drawn a seven points are marked: $10, 24, 55, 89, 122, 165, 192$. For each of those point i known the label $+1$ or $-1$. Your goal is to determine the length of one full period of the sine wave.

A random instance of the Sine Problem — Fig-1: Random instance for the problem.

So we have: \begin{align*} Z = [&(10,-1), (24,-1), (55,+1), (89,-1),\\ &(122,+1), (165,-1), (192,-1)] \end{align*} The general case seems to be hard. But at least i know a solution for a special case. That is, the marked points are following the rules $x_i = 2\cdot x_{x-1}$.

Easy instance of the Sine Problem — Fig-2: Easy instance for the problem.

Here we get: \begin{align*} Z = &[(10,-1), (20,-1), (40,+1),\\ &(80,+1), (160,+1)] \end{align*} which is easy to solve. An application is for example to attack the (Partially) Approximate Divisor Problem used in Cryptography.

Heuristic argument for the number of sufficient points

It can be heuristically proved, that even as few as $m =\log p$ points should be enough to find a unique value for $p$. Imagine a random sine wave that is plotted along the interval $I$. Since the points $x_i$ are randomly chosen from $I$, the chance that a point $x_i$ is traversed by the sine wave equal to its assigned sign value is $1/2$. Hence, the chances that a random curves traverses all $m$ points correct is $(1/2)^m$. So $\log p$ point should be enough to determine the integer $p$.

The described problem to learn the sine-wave, e.g., its parameter $p$, has relationships for example to Integer Factoring or to the Approximation GCD-Problem (e.g., can be used to attack the homomorphic encryption scheme of Dijk et al. [1])

Solution for the easy instance

Next, i will describe one of my approaches and what seems to be its limitations. To approach a problem, it is always a good start to find easy cases. Then solve the problem for this cases and then try to determine what prevents the idea from working in the general case.

In our case, an easy case is when the elements of $Z$ are not randomly chosen, but are of the form $x_i = 2x_{i-1}$. In that case, we get a $\mathcal{O}(\log^2 p)$ algorithm that returns $p$.

The algorithm in this case is as follows: Assume that we have the set $$Z = \{(x_1,s(x_1)),(x_2,s(x_2)),...,(x_m,s(x_m))\}$$ with $x_i = 2x_{i-1}$. The sign value gives us the information if the residue $r_j$ from $r_j \equiv x_i \pmod{p}$ (we denote $r_j$ in the following with $[x_i]$) is:

$s(x_i) = 1 \Rightarrow 0 \leq r_j \leq p/2$
$s(x_i) = 1 \Rightarrow p/2 \leq r_j p$

Now we list all four possible cases, between $[x_i]$ and $[x_{i-1}]$:

$[x_m] > p/2$ and $[x_m] \equiv 0\pmod{2}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/2$
$[x_m] > p/2$ and $[x_m] \equiv 1\pmod{2}$, $\Rightarrow$ $[x_{m-1}] > p/2$ and $[x_{m-1}] = ([x_m]+p)/2$
$[x_m] < p/2$ and $[x_m] \equiv 0\pmod{2}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/2$
$[x_m] < p/2$ and $[x_m] \equiv 1\pmod{2}$, $\Rightarrow$ $[x_{m-1}] > p/2$ and $[x_{m-1}] = ([x_m]+p)/2$

As one can see, the information if $[x_m]$ and $[x_{m-1}]$ are larger of less than $p/2$, is enough to deduce the parity of $[x_m]$. Remember, only $x_m$ is known and not $[x_m]$. That parity information allows the get the information how $[x_{m-1}]$ is obtained from $[x_m]$. That is either by dividing by $2$ or first adding $p$ and then divide by $2$.

This information, repeated $m$-times, allows to get $p$ for $m = \log^2 p$. Here is a little example:

We have $$Z = \{(26,1),(52,1),(104,1),(208,-1),(416,-1),(832,-1),(1664,1),(3328,1) \}$$ We start with $$ 0 \leq [x_m] = [3328] < p/2$$ It is $0 < [x_{m-1}] = [1664] < p/2$ since its sign-value is positive. So we know that $$ 0 \leq [3228]/2 = [1664] < p/4$$ The next sign-value is negative, hence we have to add $p$ and then divide by $2$: $$ p/2 \leq [3228]/4+p/2 = [832] < p/2 + p/8$$ The next sign-value is again negative, hence $$ p/2+p/4 \leq [3228]/8+p/4+p/2 = [416] < p/2 + p/4 + p/16$$ and again $$ p/2+p/4+p/8 \leq [3228]/16+p/8+p/4+p/2 = [208] < p/2 + p/4 + p/8 + p/32$$ The next three sign-values are positive, which are always simple divisions by $2$:

$$ \frac{p}{16}+\frac{p}{32}+\frac{p}{64} \leq \frac{[3228]}{128}+\frac{p}{64}+\frac{p}{32}+\frac{p}{16} = [26] < \frac{p}{16} + \frac{p}{32} + \frac{p}{64} + \frac{p}{256}$$

Next, we divide by $p$:

$$ \frac{1}{16}+\frac{1}{32}+\frac{1}{64} \leq \frac{[3228]}{128p}+\frac{1}{64}+\frac{1}{32}+\frac{1}{16} = \frac{[26]}{p} < \frac{1}{16} + \frac{1}{32} + \frac{1}{64} + \frac{1}{256}$$

Finally, we have

$$ \frac{1}{16}+\frac{1}{32}+\frac{1}{64} \leq \frac{[26]}{p} < \frac{1}{16} + \frac{1}{32} + \frac{1}{64} + \frac{1}{256}$$ which is equal to $$\frac{[26]}{p} - \frac{7}{64} < \frac{29}{256}$$

If we have $m=\log^2(p)$ points, we could use the famous approximation result that if $$\left| k - \frac{p}{q}\right| < \frac{1}{2q^2}$$ holds, one can find $p/q$ among the, only polynomial many, fractional convergents of the fraction $k$. See for example Wiener's Attack on the RSA cryptosystem.

What goes wrong in a more general case: Triple points.

What goes wrong in the case $x_i = 3x_{i-1}$? Why exactly does the previous algorithm fail and could it be fixed? Lets look what happens to our case distinction.

$[x_m] > p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] > p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $\color{red}?$
$[x_m] > p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $\color{red}?$
$[x_m] < p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] < p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $\color{red}?$
$[x_m] < p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $\color{red}?$

So the red question marks indicate, that this value is not yet determined. To make any progress, we have to assume that we know the value $p\pmod{3}$, W.l.o.g. we set $p \equiv 1\pmod{p}$, hence

$[x_m] > p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] > p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $[x_{m-1}] = ([x_m] + 2p)/3$
$[x_m] > p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $[x_{m-1}] = ([x_m] + p)/3$
$[x_m] < p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] < p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $[x_{m-1}] = ([x_m] + 2p)/3$
$[x_m] < p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $[x_{m-1}] = ([x_m] + p)/3$

We still have $4$ question marks. Let us look at point 2. We know that $[x_m]$ is larger than $p/2$ and $[x_{m-1}]$ is obtained by $[x_{m-1}] = ([x_m] + 2p)/3$, hence we have $$[x_{m-1}] = ([x_m] + 2p)/3 = [x_m]/3 + 2p/3 > p/6 + 4p/6 = 5p/6 > p/2$$ So the question mark at 2. has to be replaced with $[x_{m-1}] > p/2$.

What about point 3.? We again know that $[x_m] > p/2$ and $x_{m-1}]$ is obtained by $[x_{m-1}] = ([x_m] + p)/3$, hence we have $$[x_{m-1}] = ([x_m] + p)/3 = [x_m]/3 + p/3 > p/6 + 2p/6 = 3p/6 = p/2$$ So the question mark at 3. has to be replaced also with $[x_{m-1}] > p/2$. What about point 5.? We have now the case that $[x_m] < p/2$. And $[x_{m-1}]$ is obtained by $[x_{m-1}] = ([x_m] + 2p)/3$, hence we have $$[x_{m-1}] = ([x_m] + 2p)/3 = [x_m]/3 + 2p/3 < p/6 + 4p/6 = 5p/6$$
Here we have a problem. We dont know if $[x_{m-1}]$ is smaller or larger than $p/2$. All the information we get is that it is smaller than $5p/6$. For completness, what about point 6.? We have again the case that $[x_m] < p/2$. And $[x_{m-1}]$ is obtained by $[x_{m-1}] = ([x_m] + p)/3$, hence we have $$[x_{m-1}] = ([x_m] + p)/3 = [x_m]/3 + p/3 < p/6 + 2p/6 = p/2$$ Here we succeed and the questionmark at 6. has to be replaced with $[x_{m-1}] < p/2$.

$[x_m] > p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] > p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $[x_{m-1}] > p/2$ and $[x_{m-1}] = ([x_m] + 2p)/3$
$[x_m] > p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $[x_{m-1}] > p/2$ and $[x_{m-1}] = ([x_m] + p)/3$
$[x_m] < p/2$ and $[x_m] \equiv 0\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = [x_m]/3$
$[x_m] < p/2$ and $[x_m] \equiv 1\pmod{3}$, $\Rightarrow$ $\color{red}?$ and $[x_{m-1}] = ([x_m] + 2p)/3$
$[x_m] < p/2$ and $[x_m] \equiv 2\pmod{3}$, $\Rightarrow$ $[x_{m-1}] < p/2$ and $[x_{m-1}] = ([x_m] + p)/3$

Nevertheless, also if we could replace the last questionmark, we are still facing the problem, that a combination of two successive sign-value, does not lead to a distinct residue class of $[x_m]$ modulo $3$. If for example we know that $[x_m] > p/2$ and $[x_{m-1}] > p/2$, then we are left with two cases, namely $[x_m] \equiv \pm 1\pmod{3}$.
It is not hard to imagine that this approach gets even worse if the ration $x_m/x_{m-1} > 3$ or is even in $\mathbb{Q}$. So probably this approach, dispite it is able to solve the named special case, does not generalize.

Related Posts:

Learning a Sine Wave - Part 2 - A more general solution

[1] Marten van Dijk; Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan (2009-12-11). "Fully Homomorphic Encryption over the Integers" (PDF). International Association for Cryptologic Research. Retrieved 2010-03-18

Kryptos - The Cipher (Part 4) - Correctly positioned decryption of the word BERLIN

EASTNORTHEAST - This is not exactly the hint Jim Sanborn (JS) gave for K4 on the 29th of January this year. He only gave NORTHEAST - which refers to the positions 26-34 of K4's plaintext. Beside BERLIN and CLOCK it is the third revealed plaintext word of K4. However, also this hint does not seem to help much. However, it just so happened, that a member in the yahoo kryptos group had a conversation with Jim Sanborn due to a submitted solution. Sandborn's answer to the question contained again the last clue which surprisingly was EASTNORTHEAST at position 22-34. Jim Sanborns compass rose at CIA There is disagreement if Jim revealed this on purpose or he did it accidentially, but the new extended clue seem to be serious and valid.Interestingly, EASTNORTHEAST is exactly the direction which is illustrated on the compass rose on one of the stones around kryptos, also created by Jim Sanborn. Actually, i dont really kn...

NumberWorld

Search This Blog

Featured Post

Kryptos - The Cipher (Part 4) - Correctly positioned decryption of the word BERLIN