Editorial for DMOPC '20 Contest 5 P6 - Top Row

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: 4fecta

We note that the problem asks for the sum of the number of distinct substrings and the total length of all distinct substrings for each specified range.

Subtask 1

For this subtask, any reasonable brute force algorithm (such as maintaining a set of hash values for each round) should pass.

Time Complexity: $\mathcal{O}(|S|^2Q)$ ~\mathcal{O}(|S|^2Q)~

Subtask 2

For this subtask, it is sufficient to solve the problem for each of the given rounds independently using any suffix string data structure. One such data structure which provides a straightforward solution to this subtask is the suffix automaton (SAM), which will also be used in later subtasks.

Time Complexity: $\mathcal{O}(\sum_{i = 1}^Q(r_i - l_i + 1))$

Subtask 3

We are now approaching the full solution. To start, it is helpful to imagine that we are allowed to process the rounds offline, hoping that we can bring it back online later using some persistent data structure. If we sort the rounds in increasing order by their right endpoint, then we can run a sweeping algorithm on the string, each time appending a new character to the back of the string. When we append a new character $S_i$ ~S_i~ to the string, we must add the contribution of all substrings ending at this character to the answer for all rounds with right endpoint greater than or equal to $i$ ~i~. Let's maintain an array $A$ ~A~ which stores the contribution of all strings starting at each index. For any of these substrings $T$ ~T~, we should add its contribution $|T| + 1$ ~|T| + 1~ to $A_{i - |T| + 1}$ ~A_{i - |T| + 1}~. Then, for any round with right endpoint $i$ ~i~, its answer is simply the sum of all $A_j$ ~A_j~ with $l \le j \le i$ ~l \le j \le i~, where $l$ ~l~ represents the left endpoint of that round. Notice that the updates are arithmetic sequence updates on a contiguous segment of $A$ ~A~, and we can support these operations using a lazy segment tree. However, we run into some problems with overcounting when the new character $S_i$ ~S_i~ has already appeared before in the string. Therefore, we need to subtract the contribution of substrings ending with $S_i$ ~S_i~ at the most recent position less than $i$ ~i~ (similar to how we would handle the problem of counting distinct elements in a range). This is where the SAM comes in. Notice that in the suffix link tree of the SAM of $S$ ~S~, the positions from which we need to subtract contribution lie precisely on the path from the state representing $S_i$ ~S_i~ to the root of the SAM. On this path, we need to subtract the contribution of previous duplicate substrings, as well as update the latest end position to $i$ ~i~ so that we can subtract away the contribution of $S_i$ ~S_i~ later if needed. Since the string here is fully random, we can hypothesize that the height of the suffix link tree is not that high, and we may therefore jump suffix links in a brute force manner, each time performing the necessary updates to each state until the root. To bring this solution back online, note that we can persistently maintain the lazy segment tree and query as required.

Time Complexity: $\mathcal{O}(h|S| \log |S| + Q \log |S|)$ , where $h$ ~h~ is the expected height of the suffix link tree.

Subtask 4

For full marks, we notice that the operation of updating all states from the current state to the root is extremely similar to the Access operation of a link/cut tree. If we maintain the suffix link tree using an LCT, we can support suffix link jumping to the root in the amortized $\mathcal{O}(\log N)$ ~\mathcal{O}(\log N)~ complexity of the LCT, and that is sufficient for full marks. It is also possible to achieve the same time complexity using small to large merging techniques on the set of end positions for each state of the suffix link tree. Special care should be taken to avoid excessive memory usage.

Time Complexity: $\mathcal{O}(|S| \log^2 |S| + Q \log |S|)$

Comments

14

dalt commented on April 28, 2021, 11:57 a.m.

P6 have appeared long ago, and I even find an editorial for similar problem in contest from another OJ, but I failed because of memory exceed. The problem is easy, but the memory limit is crazy. In my opinion, the reasonable limits are supposed to be 4 x spense of author's solution if the code is written by C++ or C, 2 times for other language.

8

4fecta commented on April 28, 2021, 2:17 p.m.

Hi, could you tell me where P6 has appeared before? I spent a long time searching for it on the internet but I couldn't find the same problem, so I was pretty confident that the problem itself is unique. As for similar problems, I was aware that similar problems exist from past contests (for example you can see the main problem mentioned here), but the idea of backing a suffix automaton with an LCT to support suffix updates at any index was so brilliant and new for me that I decided there was a place for it on the DMOPC this time. Regarding your submissions, I believe they failed more so do the time limit than the memory limit, looking at the verdicts of your in contest submissions. That's quite unfortunate, since we intentionally kept the time limit relatively tight (2.5x reference solution in C++) to minimize the probability of constant optimized quadratic (or faster, like sqrt) solutions passing. I'll keep that in mind for future problems though, and if we are able to get enough Java testing then perhaps we can set custom limits for Java. Thanks for reaching out.

6

dalt commented on April 29, 2021, 9:31 a.m. ← edit 6 →

Of course, it's similar rather than same problem. I first knew such problem in https://codeforces.com/blog/entry/62331 long ago, and in contest, I found a similar problem without persistent trick in Chinese OJ https://www.luogu.com.cn/problem/P6292. There is a Chinese blog talked about how to solve the online version, I learned the trick here.

Thanks for your cool problem, at first glance I didn't care about such problem until I met a hard version in your contest, as a consequence I learned a new skill.