Editorial for COCI '14 Contest 5 #6 Divljak


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The solution sufficient for partial points was to check whether some of the barbarians' strings is a substring of the word Tarzan gives out and if there is, increment the counter of that word by 1, repeating this procedure each time Tarzan gives out another word. Regarding the solution for 100\% of points, there are multiple solutions, both offline and online. Here we will describe an online solution.

To begin with, let's construct an Aho-Corasick tree, which is a prefix tree. Let's define p(X) as the string obtained by traversing from the root of the tree to node X in the prefix tree.

For every node X, the Aho-Corasick tree remembers the deepest node Y such that p(Y) is a suffix of p(X). Let's call such node fail(X).

Additionally, we need to remember for each node X, node Z, where node Z is the deepest node on which some of the barbarian's strings ends, and at the same time p(Z) is a suffix of p(X). Let's call such node back(X).

Now, let's construct an explicit tree from back(X) links. Therefore, for each node on the Aho-Corasick tree, make a new node in the new tree so that the parent of the new node, node X, is node back(X) in the new tree.

Let's call the new node of some node X in the old tree exp(X). Having the tree constructed this way, by traveling to the root of the new tree from a node exp(X) we walk through all suffixes of string p(X) that are at the same time words written on the tablets of the barbarians.

Now, let's see what operations need to be implemented.

During the update operation (in other words, when Tarzan gives a new word), it is necessary to update all strings that are in it as a substring. How do we find such strings? Let P be the newly given word. Let pref(i) be the prefix of word P up to position i. We will use a classic traversal of the Aho-Corasick tree that, for each prefix of string P, pref(i) finds the longest suffix of that prefix that exists in the Aho-Corasick tree. Formally, for each pref(i), where i is the position of the prefix in string P, it finds the deepest node X such that p(X) is the suffix of pref(i). Let's call such X for a pref(i), up(i). If we walk towards the node in our newly generated tree from every node exp(up(i)), we will walk through all words that are substring of the word P and are written on the tablets of the barbarians.

How to find up(i) for each i? Let's assume that we have up(i) for an i, and try to calculate up(i) for i+1. Let's call the letter appearing at position i+1 in the string L. In case there is an edge for letter L on node up(i), then up(i+1) is simply the node we get to when moving for edge L from node up(i). In the case when such an edge doesn't exist, we try to find such an edge for fail(up(i)). Therefore, we try to find such an edge for the next longest suffix that is the suffix of the current suffix pref(i). The second operation is repeated until we get back to the root of the tree or until we find an edge for letter L.

The construction of this structure is possible in linear time per number of characters \mathcal O(L). Also, the construction of the new tree is possible in linear time.

Up to now, we currently need to be able to increase all nodes in the union of paths from the root of the new tree to the set of nodes up(i), for each i, by exactly 1.

Implementing this operation is possible by using heavy-light decomposition and logarithmic structure. Let's decompose the tree to a set of chains such that the number of chains changed on the path from the root of the tree to node exp(X) is of the size \mathcal O(\log L), where L is the number of characters in the tree. The remaining task is to increase a part of that chain by exactly 1. To not increase each string only once, but as many times as it appears in string P, we would simply increase the corresponding part of each chain during our traversal from exp(up(i)) to the root of the tree by 1. This increase can be very easily done by using a logarithmic structure for each chain.

How will we increase if we want to increase each node only once? We can notice that, on each chain, we will eventually increase only one of its prefixes by exactly 1 at the end of each update. Accordingly, we can, on each chain, remember up to what prefix had that chain been increased, and in the case that the current increase increases a few other nodes, we increase only that difference set between nodes. On a concrete example, let's say that so far in this update, for some chain the prefix of size 5 had been increased by 1. Now comes a new change on the chain that requires us to increase the prefix of size 8. In this case, we will not increase the prefix of size 8, but will only increase position [6, 8]. This is the way we handle the increase of each string by exactly 1. There are a few other ways in which we could handle this increase.

The total complexity of this approach is \mathcal O(L \log^2 L), where L is the number of total characters. The space complexity is \mathcal O(L \times \text{alphabet size}).

Figuring out the offline solution of the same complexity is left as an exercise to the reader. (Hint: suffix array, union-find, BST.)


Comments

There are no comments at the moment.