Editorial for COCI '15 Contest 5 #5 OOP
Submitting an official solution before solving the problem yourself is a bannable offence.
In order to solve the task, let's first try to see how we can compare a pattern and a word. Given the fact that *
is replaced by any series of letters, and the letters to the left and to the right of *
remain unchanged, everything to the left of *
must be a prefix of the word which we're comparing to, and everything to the right must be the suffix of the word we're comparing to. Therefore, if we denote everything to the left of *
as and everything to the right as , it is sufficient to check whether is a prefix of the word, is the suffix of the word, and that the sum of lengths of and is less than or equal to the length of the word. In the case where this does not hold, an overlap between the prefix and suffix could occur (e.g. ab*ba
with aba
, ab
and ba
share letters in aba
). Now we can solve the task for of total points, if we use a quick comparison between strings using hashing. The complexity of this solution is .
Let's now solve the task for all points. First we construct a prefix tree out of all words from the input. A node in the tree that corresponds to the prefix of length of the word of length will be used to store the hash value of the suffix of length of the word . After we've constructed this structure, let's try to see what query we need to answer in order to find out the number of words that correspond to the pattern. Let's denote the part of the pattern to the left of *
as . The answer to the query is actually the number of hashes located in the subtree of prefix that are equal to the hash of the right part of the pattern (the part after *
). How can we answer this question quickly?
If we use DFS to traverse the prefix tree and denote the time we first visited the node with and the time we returned from node to its parent with , it is easy to see that all nodes that are located in the subtree of have in the interval .
If we group the same hashes from the prefix tree in a structured sorted by of the corresponding nodes, we can, for those hashes, answer how many hashes there are in the subtree of node . We do this by querying the structure for hash how many from the interval there are. If we use a map<int, vector<int>>
as the structure, we can easily support the aforementioned query with a simple binary search over the corresponding vector for a given hash. If is the number of characters in the input, the total complexity of the solution is .
Comments