Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.
In order to solve the task, let's first try to see how we can compare a pattern and a word. Given the fact that *
is replaced by any series of letters, and the letters to the left and to the right of *
remain unchanged, everything to the left of *
must be a prefix of the word which we're comparing to, and everything to the right must be the suffix of the word we're comparing to. Therefore, if we denote everything to the left of *
as
and everything to the right as
, it is sufficient to check whether
is a prefix of the word,
is the suffix of the word, and that the sum of lengths of
and
is less than or equal to the length of the word. In the case where this does not hold, an overlap between the prefix and suffix could occur (e.g. ab*ba
with aba
, ab
and ba
share letters in aba
). Now we can solve the task for
of total points, if we use a quick comparison between strings using hashing. The complexity of this solution is
.
Let's now solve the task for all points. First we construct a prefix tree out of all words from the input. A node in the tree that corresponds to the prefix of length
of the word
of length
will be used to store the hash value of the suffix of length
of the word
. After we've constructed this structure, let's try to see what query we need to answer in order to find out the number of words that correspond to the pattern. Let's denote the part of the pattern to the left of *
as
. The answer to the query is actually the number of hashes located in the subtree of prefix
that are equal to the hash of the right part of the pattern (the part after *
). How can we answer this question quickly?
If we use DFS to traverse the prefix tree and denote the time we first visited the node with
and the time we returned from node
to its parent with
, it is easy to see that all nodes
that are located in the subtree of
have
in the interval
.
If we group the same hashes from the prefix tree in a structured sorted by
of the corresponding nodes, we can, for those hashes, answer how many hashes
there are in the subtree of node
. We do this by querying the structure for hash
how many
from the interval
there are. If we use a map<int, vector<int>>
as the structure, we can easily support the aforementioned query with a simple binary search over the corresponding vector for a given hash. If
is the number of characters in the input, the total complexity of the solution is
.
Comments