Editorial for DMOPC '20 Contest 5 P6 - Top Row
Submitting an official solution before solving the problem yourself is a bannable offence.
We note that the problem asks for the sum of the number of distinct substrings and the total length of all distinct substrings for each specified range.
For this subtask, any reasonable brute force algorithm (such as maintaining a set of hash values for each round) should pass.
For this subtask, it is sufficient to solve the problem for each of the given rounds independently using any suffix string data structure. One such data structure which provides a straightforward solution to this subtask is the suffix automaton (SAM), which will also be used in later subtasks.
We are now approaching the full solution. To start, it is helpful to imagine that we are allowed to process the rounds offline, hoping that we can bring it back online later using some persistent data structure. If we sort the rounds in increasing order by their right endpoint, then we can run a sweeping algorithm on the string, each time appending a new character to the back of the string. When we append a new character to the string, we must add the contribution of all substrings ending at this character to the answer for all rounds with right endpoint greater than or equal to . Let's maintain an array which stores the contribution of all strings starting at each index. For any of these substrings , we should add its contribution to . Then, for any round with right endpoint , its answer is simply the sum of all with , where represents the left endpoint of that round. Notice that the updates are arithmetic sequence updates on a contiguous segment of , and we can support these operations using a lazy segment tree. However, we run into some problems with overcounting when the new character has already appeared before in the string. Therefore, we need to subtract the contribution of substrings ending with at the most recent position less than (similar to how we would handle the problem of counting distinct elements in a range). This is where the SAM comes in. Notice that in the suffix link tree of the SAM of , the positions from which we need to subtract contribution lie precisely on the path from the state representing to the root of the SAM. On this path, we need to subtract the contribution of previous duplicate substrings, as well as update the latest end position to so that we can subtract away the contribution of later if needed. Since the string here is fully random, we can hypothesize that the height of the suffix link tree is not that high, and we may therefore jump suffix links in a brute force manner, each time performing the necessary updates to each state until the root. To bring this solution back online, note that we can persistently maintain the lazy segment tree and query as required.
Time Complexity: , where is the expected height of the suffix link tree.
For full marks, we notice that the operation of updating all states from the current state to the root is extremely similar to the
Access operation of a link/cut tree. If we maintain the suffix link tree using an LCT, we can support suffix link jumping to the root in the amortized complexity of the LCT, and that is sufficient for full marks. It is also possible to achieve the same time complexity using small to large merging techniques on the set of end positions for each state of the suffix link tree. Special care should be taken to avoid excessive memory usage.