Editorial for COCI '07 Contest 5 #6 Baza


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Sort the words in alphabetic order. In the resulting table, every word in the database can be searched for with the following algorithm.

We start the search on an interval containing all words. Using binary search, we find the interval of words starting with the same letter as the query word. Note the interval boundaries and proceed with the second letter. Binary searching on the new interval, we find the interval of words which share both the first and second letters as the query word. Note the interval boundaries again and move on.

Suppose we insert the same words from the database into a data structure, in order in which they are given in the input. Then it is possible to calculate, for every prefix of every word, how many words before it share the prefix.

Now attach a counter to every interval of consecutive equal letters in a column. When inserting words into the data structure, for every interval the word belongs to, we increase the counter. Every counter tells us exactly how many words there are with that prefix.

After preprocessing the input, for any word W and each of its prefixes P we know the numbers:

  • counter(P) – how many words start with P
  • before(W, P) – how many words are before word W and start with P (including W itself)
  • id(W) – the identifier of word W (when it first appears in the database)

Now every query word Q can be answered:

  • If Q is in the database, then we output id(Q) + before(Q, P_1) + before(Q, P_2) + \dots + before(Q, P_L); where L is the length of word Q, and P_k is a k^\text{th} prefix of word Q.
  • Else output N + counter(P_1) + counter(P_2) + \dots + counter(P_L).

The time complexity of the preprocessing phase is \mathcal O(LN \log N), while answering a query is \mathcal O(L \log N), where L is the length of a word, and N is the number of words in the database.


Comments

There are no comments at the moment.