Editorial for COCI '11 Contest 5 #6 Popločavanje


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Notice the following: if, for a given position i in the large word (the one describing the street), we can find the longest short word (tile) that can be placed starting at position i, the problem is reduced to finding the union of intervals, which is solvable using a simple sweep.

The remaining problem is efficiently finding the longest tile for each position, which can be done in one of the following ways:

Suffix Array

A large number of problems with strings, including this one, can be solved using a suffix array. It is a sorted array of all suffixes of a string and can be computed using multiple methods. The simplest one, with complexity \mathcal O(N \log^2 N), using hashing is sufficient for this problem.

It is important to notice that the indices of suffixes, whose prefix is one of the given M words, are grouped together in a suffix array, forming a suffix interval. This interval can be found using binary search for each word. After finding all the intervals, we need to find, for each suffix, the longest of M words whose interval covers that suffix, which can be found using sweep. After that, another sweep can easily determine the tileable positions, using the longest word that can be placed at each position.

There are other efficient solutions using hashing.

Aho-Corasick Tree

The Aho-Corasick tree can be used to find many short words in a long one using a single pass over the long word. The tree can be constructed with complexity \mathcal O(sum\_of\_short\_word\_lengths), while finding matches depends on their number.

The basic idea behind the tree is well described in the following materials:

For the purposes of this problem, while building the tree, for each node we can keep the data about the longest of the given short words that is a suffix of the prefix represented by a particular node. Let M(x) denote that value for node x. M(x) is then the maximum of:

  • depth(x), if a short word ends in x
  • M(failureLink(x)), where failureLink is described in the given materials

Building the tree still has complexity \mathcal O(sum\_of\_short\_word\_lengths), while the (since we are not interested in all match positions, but only the maximum length for each position in the long word) complexity of searching through the long word is linear in its length. When encountering a letter, we try to descend down the tree. If it is not possible, we follow failureLinks until finding either a node which we can descend from, or the root of the tree. In the node where we have ended up, we take the previous computed maximum and set the interval bounds for the future sweep.


Comments

There are no comments at the moment.