Editorial for Baltic OI '01 P2 - Crack the Code
Submitting an official solution before solving the problem yourself is a bannable offence.
Encryption Program
The encryption program encrypts (one by one) the successive characters from the input file. Characters that are not letters (of the English alphabet) are left unaltered. The key given by the user specifies the way characters are coded. Namely, the character from the input file (counting from ) is "rotated" in a "cyclic" English alphabet (consisting of capital letters only) by the value of the element of the key (Elements of the key are also numbered from ).
Reverse Engineering Algorithm
It can be easily seen, that knowing the key we can decode the message in a similar way it was coded — we can apply the coding algorithm given above for a key consisting of negated elements of the coding key. Hence, the problem reduces to finding the coding key. In case of texts written in the natural language, the relative frequencies of particular letters are different. So, we can use some statistical methods to determine the key. Basing on the uncoded piece of text we can calculate the probability with which particular letters appear in the text.
We will denote by the probability that (the next) character in the text is , and by the probability distribution itself. To find the element of the key, we calculate the probability distribution for letters appearing in the coded text on positions .
We will denote by the probability of appearance of letter , and by the probability distribution itself.
Now, it is enough to find such a cyclic rotation of sequence that is closest to the distribution .
Intended Solution
Our solution is based on the method described above. We also need a measure of how "close" the distribution is to (some cyclic rotation of the sequence ). We will use here statistical test , which can be expressed as:
where is the number of letters in the sample on which is based.
Remark. The test should be used only in the cases when for each the value is not too small (let's say, not less than ). Therefore, sometimes it may be necessary to group a few events in the distribution , so that their total probability is big enough, relatively to .
Other Solutions
This solution differs from the one above only in the measure used to compare probability distributions. We can view the distributions and as vectors in a -dimensional space ( is the number of letters in the alphabet). We can calculate their dot product and normalise it, dividing it by the square root of the product of lengths of the two vectors. The closer the result is to , the closer are the two distributions to each other.
Comments