Editorial for Baltic OI '01 P2 - Crack the Code


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Encryption Program

The encryption program encrypts (one by one) the successive characters from the input file. Characters that are not letters (of the English alphabet) are left unaltered. The key given by the user specifies the way characters are coded. Namely, the ith character from the input file (counting from 1) is "rotated" in a "cyclic" English alphabet (consisting of capital letters only) by the value of the [(i1)mod10+1]th element of the key (Elements of the key are also numbered from 1).

Reverse Engineering Algorithm

It can be easily seen, that knowing the key we can decode the message in a similar way it was coded — we can apply the coding algorithm given above for a key consisting of negated elements of the coding key. Hence, the problem reduces to finding the coding key. In case of texts written in the natural language, the relative frequencies of particular letters are different. So, we can use some statistical methods to determine the key. Basing on the uncoded piece of text we can calculate the probability with which particular letters appear in the text.

We will denote by pi the probability that (the next) character in the text is i, and by P the probability distribution itself. To find the ith element of the key, we calculate the probability distribution for letters appearing in the coded text on positions i,(i+10),(i+20),.

We will denote by qi the probability of appearance of letter i, and by Q the probability distribution itself.

Now, it is enough to find such a cyclic rotation of sequence (qA,qB,,qZ) that is closest to the distribution P.

Intended Solution

Our solution is based on the method described above. We also need a measure of how "close" the distribution P is to R=(rA,rB,,rZ)=(,qZ,qA,qB,) (some cyclic rotation of the sequence (qA,qB,,qZ)). We will use here statistical test χ2, which can be expressed as:

ni=AZ(piri)2pi

where n is the number of letters in the sample on which Q is based.

Remark. The test χ2 should be used only in the cases when for each i the value npi is not too small (let's say, not less than 0.1). Therefore, sometimes it may be necessary to group a few events in the distribution P, so that their total probability is big enough, relatively to n.

Other Solutions

This solution differs from the one above only in the measure used to compare probability distributions. We can view the distributions P and R as vectors in a 26-dimensional space (26 is the number of letters in the alphabet). We can calculate their dot product and normalise it, dividing it by the square root of the product of lengths of the two vectors. The closer the result is to 1, the closer are the two distributions to each other.


Comments

There are no comments at the moment.