Editorial for COCI '15 Contest 5 #5 OOP


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

In order to solve the task, let's first try to see how we can compare a pattern and a word. Given the fact that * is replaced by any series of letters, and the letters to the left and to the right of * remain unchanged, everything to the left of * must be a prefix of the word which we're comparing to, and everything to the right must be the suffix of the word we're comparing to. Therefore, if we denote everything to the left of * as L and everything to the right as R, it is sufficient to check whether L is a prefix of the word, R is the suffix of the word, and that the sum of lengths of L and R is less than or equal to the length of the word. In the case where this does not hold, an overlap between the prefix and suffix could occur (e.g. ab*ba with aba, ab and ba share letters in aba). Now we can solve the task for 40\% of total points, if we use a quick comparison between strings using hashing. The complexity of this solution is \mathcal O(NQ).

Let's now solve the task for all points. First we construct a prefix tree out of all words from the input. A node in the tree that corresponds to the prefix of length p of the word X of length L will be used to store the hash value of the suffix of length L-p of the word X. After we've constructed this structure, let's try to see what query we need to answer in order to find out the number of words that correspond to the pattern. Let's denote the part of the pattern to the left of * as L. The answer to the query is actually the number of hashes located in the subtree of prefix L that are equal to the hash of the right part of the pattern (the part after *). How can we answer this question quickly?

If we use DFS to traverse the prefix tree and denote the time we first visited the node with D_t and the time we returned from node t to its parent with F_t, it is easy to see that all nodes x that are located in the subtree of t have D_x in the interval D_t, F_t.

If we group the same hashes from the prefix tree in a structured sorted by D_x of the corresponding nodes, we can, for those hashes, answer how many hashes H there are in the subtree of node t. We do this by querying the structure for hash H how many D_x from the interval D_t, F_t there are. If we use a map<int, vector<int>> as the structure, we can easily support the aforementioned query with a simple binary search over the corresponding vector for a given hash. If S is the number of characters in the input, the total complexity of the solution is \mathcal O(S \log S).


Comments

There are no comments at the moment.