Editorial for Baltic OI '01 P2 - Crack the Code

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Encryption Program

The encryption program encrypts (one by one) the successive characters from the input file. Characters that are not letters (of the English alphabet) are left unaltered. The key given by the user specifies the way characters are coded. Namely, the $i^\text{th}$ ~i^\text{th}~ character from the input file (counting from $1$ ~1~) is "rotated" in a "cyclic" English alphabet (consisting of capital letters only) by the value of the $[(i-1) \bmod 10+1]^\text{th}$ ~[(i-1) \bmod 10+1]^\text{th}~ element of the key (Elements of the key are also numbered from $1$ ~1~).

Reverse Engineering Algorithm

It can be easily seen, that knowing the key we can decode the message in a similar way it was coded — we can apply the coding algorithm given above for a key consisting of negated elements of the coding key. Hence, the problem reduces to finding the coding key. In case of texts written in the natural language, the relative frequencies of particular letters are different. So, we can use some statistical methods to determine the key. Basing on the uncoded piece of text we can calculate the probability with which particular letters appear in the text.

We will denote by $p_i$ ~p_i~ the probability that (the next) character in the text is $i$ ~i~, and by $P$ ~P~ the probability distribution itself. To find the $i^\text{th}$ ~i^\text{th}~ element of the key, we calculate the probability distribution for letters appearing in the coded text on positions $i, (i+10), (i+20), \dots$ ~i, (i+10), (i+20), \dots~.

We will denote by $q_i$ ~q_i~ the probability of appearance of letter $i$ ~i~, and by $Q$ ~Q~ the probability distribution itself.

Now, it is enough to find such a cyclic rotation of sequence $(q_A, q_B, \dots, q_Z)$ ~(q_A, q_B, \dots, q_Z)~ that is closest to the distribution $P$ ~P~.

Intended Solution

Our solution is based on the method described above. We also need a measure of how "close" the distribution $P$ ~P~ is to $R = (r_A, r_B, \dots, r_Z) = (\dots, q_Z, q_A, q_B, \dots)$ (some cyclic rotation of the sequence $(q_A, q_B, \dots, q_Z)$ ~(q_A, q_B, \dots, q_Z)~). We will use here statistical test $\chi^2$ ~\chi^2~, which can be expressed as:

$\displaystyle n \sum_{i=A}^Z \frac{(p_i - r_i)^2}{p_i}$

where $n$ ~n~ is the number of letters in the sample on which $Q$ ~Q~ is based.

Remark. The test $\chi^2$ ~\chi^2~ should be used only in the cases when for each $i$ ~i~ the value $np_i$ ~np_i~ is not too small (let's say, not less than $0.1$ ~0.1~). Therefore, sometimes it may be necessary to group a few events in the distribution $P$ ~P~, so that their total probability is big enough, relatively to $n$ ~n~.

Comments

There are no comments at the moment.