Editorial for COCI '13 Contest 1 #6 Slastičar


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Let us first solve a simpler version of the problem, where the robot doesn't stop upon finding a matching serial number, but tries out all N segments instead.

Let f(W, S) be the number of suffixes of string S that have W as a prefix. Let S[x \dots y] denote a substring of S from position x to position y.

The number of comparisons that the robot will make for a word W with length L is then:

\displaystyle BS + f(W[1 \dots 1], S) + f(W[1 \dots 2], S) + \dots + f(W[1 \dots L], S)

where BS is the total number of segments that the robot has started comparison with. With our simplification assumption, BS always equals N, beaus the robot doesn't stop comparison upon finding a matching word.

A more efficient way to compute f(W, S) is needed. One possible method is building a suffix array of S. A suffix array is a set of all suffixes of a string S sorted lexicographically. Binary search can be applied to the suffix array in order to find the interval of suffixes beginning with the character W[1]. Let us denote this interval by [l_1, r_1]. The value of f(W[1 \dots 1], S) then equals r_1-l_1+1. Next, within this interval, we need to find the interval of suffixes with W[2] as the second letter, [l_2, r_2]. f(W[1 \dots 2], S) is then r_2-l_2+1. We repeat the procedure for all remaining characters in W.

The complexity of this search is \mathcal O(L \log N), which is fast enough since the sum of all query lengths will not exceed 3\,000\,000.

What about the full problem, where the robot stops upon finding a match?

For each word i, let us denote the first position where it appears with p_i. Let g(W, S, p) be the number of suffixes of string S with the prefix W starting at a position less than or equal to p. We can now derive a new formula for the total number of comparisons:

\displaystyle BS + g(W[1 \dots 1], S, p_i) + \dots + g(W[1 \dots L], S, p_i)

where BS now equals p_i since we stop comparing after that position.

In order to make computing the function g easier, we will not respond to queries in the order that they appear in the input, but sort them in a suitable order instead and store the solutions in an array. After computing all the solutions, we can output them in the original order.

We will sort the queries by p_i in ascending order. We will also utilize a structure supporting two operations: add 1 to some position, find the sum of an interval. This structure will enable us to find the number of important prefixes in an interval. It can be implemented using a Fenwick tree or an interval (tournament) tree.

We will process queries as follows:

  • suppose we are currently responding to a query first matching at p_i
  • in the structure, we set 1 to corresponding positions for all suffixes up to p_i; notice that it is not necessary to do so for all suffixes for each query, but only for suffixes after the one we stopped at in the previous query, since the queries are sorted by p_i
  • as in the simpler version of the problem, we find the intervals [l_1, r_1], [l_2, r_2], \dots, [l_L, r_L]; however, we do not add r_i-l_i+1 to the solution, but instead the sum of the interval [l_i, r_i] in the structure, which corresponds to the number of suffixes starting at position p_i or less in the interval [l_i, r_i] in the suffix array
  • we store the result into the solution array at the appropriate position

Finally, we just need to output the solution array.

There is one more detail left unresolved: how do we compute the numbers p_i? Upon finding the last interval [l_L, r_L] for some word W, we know that all suffixes from that interval have W as a prefix. The first position where W appears is thus the starting position of the leftmost of those suffixes, that is, the suffix with the smallest index. We therefore need a structure for finding the minimum number in an interval. This can be done using an interval tree or a similar structure.

There are multiple algorithms for building a suffix array with varying complexities that can be found on the Internet. The official solution uses an algorithm with complexity \mathcal O(N \log^2 N). Also, both an interval tree and a Fenwick tree are implemented in the official solution. The complexity of a query in both structures is \mathcal O(\log N).

The total complexity of the algorithm is \mathcal O(N \log^2 N + \text{query_lengths_sum} \log N).


Comments

There are no comments at the moment.