Editorial for COCI '11 Contest 5 #6 Popločavanje
Submitting an official solution before solving the problem yourself is a bannable offence.
Notice the following: if, for a given position in the large word (the one describing the street), we can find the longest short word (tile) that can be placed starting at position , the problem is reduced to finding the union of intervals, which is solvable using a simple sweep.
The remaining problem is efficiently finding the longest tile for each position, which can be done in one of the following ways:
Suffix Array
A large number of problems with strings, including this one, can be solved using a suffix array. It is a sorted array of all suffixes of a string and can be computed using multiple methods. The simplest one, with complexity , using hashing is sufficient for this problem.
It is important to notice that the indices of suffixes, whose prefix is one of the given words, are grouped together in a suffix array, forming a suffix interval. This interval can be found using binary search for each word. After finding all the intervals, we need to find, for each suffix, the longest of words whose interval covers that suffix, which can be found using sweep. After that, another sweep can easily determine the tileable positions, using the longest word that can be placed at each position.
There are other efficient solutions using hashing.
Aho-Corasick Tree
The Aho-Corasick tree can be used to find many short words in a long one using a single pass over the long word. The tree can be constructed with complexity , while finding matches depends on their number.
The basic idea behind the tree is well described in the following materials:
For the purposes of this problem, while building the tree, for each node we can keep the data about the longest of the given short words that is a suffix of the prefix represented by a particular node. Let denote that value for node . is then the maximum of:
- , if a short word ends in
- , where is described in the given materials
Building the tree still has complexity , while the (since we are not interested in all match positions, but only the maximum length for each position in the long word) complexity of searching through the long word is linear in its length. When encountering a letter, we try to descend down the tree. If it is not possible, we follow until finding either a node which we can descend from, or the root of the tree. In the node where we have ended up, we take the previous computed maximum and set the interval bounds for the future sweep.
Comments