Editorial for COCI '11 Contest 4 #6 Kriptogram

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

We can solve this task by modifying the Knuth-Morris-Pratt (KMP) string searching algorithm. Let's introduce some notation.

We will denote corresponding words of the encrypted message with $A[1], A[2], \dots, A[n]$ ~A[1], A[2], \dots, A[n]~. Also, $A[x, y]$ ~A[x, y]~ will denote the sentence made up of words $A[x]$ ~A[x]~ through $A[y]$ ~A[y]~.
$B[1], B[2], \dots, B[m]$ ~B[1], B[2], \dots, B[m]~ will be words from the sentence of the original text, and $B[x, y]$ ~B[x, y]~ sentence made up of words $B[x]$ ~B[x]~ through $B[y]$ ~B[y]~.
Let matches( $A[x, x+L]$ ~A[x, x+L]~, $B[y, y+L]$ ~B[y, y+L]~) be boolean function telling us whether $A[x, x+L]$ ~A[x, x+L]~ can be decrypted into $B[y, y+L]$ ~B[y, y+L]~. For example, matches(a b a, c d c) = true; matches(a b b, x y z) = false.

As in standard KMP, we will calculate the prefix function $P[1, 2, \dots, m]$ ~P[1, 2, \dots, m]~, but with slightly different meaning. $P[x]$ ~P[x]~ will be equal to largest possible $L$ ~L~ such that:

matches( $B[1, L]$ ~B[1, L]~, $B[x-L+1, x]$ ~B[x-L+1, x]~) = true¹

After finding $P$ ~P~, we must find $B$ ~B~ within $A$ ~A~. For each word in $A$ ~A~, we are interested in the largest suffix that corresponds to some prefix of $B$ ~B~. If we encounter a mismatch, we continue with the largest possible prefix of $B$ ~B~, which we look up in $P$ ~P~.

We must also find a way to efficiently evaluate matches function. We will transform our messages using the following transformation:

$T(X)[i] = -1$ ~T(X)[i] = -1~ if $X[i]$ ~X[i]~ doesn't appear in $X[1, i-1]$ ~X[1, i-1]~
$T(X)[i] = j$ ~T(X)[i] = j~ if $j$ ~j~ is the largest index such that $j < i$ ~j < i~ and $X[j] = X[i]$ ~X[j] = X[i]~

By using $A' = T(A)$ ~A' = T(A)~ and $B' = T(B)$ ~B' = T(B)~, we can calculate matches in linear time which is sufficient for obtaining the maximum number of points. Total complexity is also linear.

¹ Only difference between this definition and the original one is that we use our matches function instead of standard string comparison.

Comments

There are no comments at the moment.