Editorial for New Year's '18 P7 - Tree of Life
Submitting an official solution before solving the problem yourself is a bannable offence.
Author:
For the first subtask, the longest common substring of two DNA sequences is simply the minimum of their lengths. Note that two nodes and are distant if and only if there is a node with two distinct children such that is in the subtree of one of the children and is in the subtree of the other. We can use simple tree DP to determine the answer.
Time Complexity:
For the second subtask, there were several different solutions. The majority of them were based around binary searching for the answer. This editorial will outline a different approach.
Consider the suffix array of the concatenation of the strings. We use the following algorithm:
Keep a left pointer and a right pointer. There are two states: either the left pointer is fixed or the right pointer is fixed. Initially, the left pointer is fixed.
When the left pointer is fixed, keep moving the right pointer until the node corresponding to the suffix at the right pointer is not an ancestor of the node corresponding to the suffix at the left pointer. Once this happens, the right pointer is now fixed.
When the right pointer is fixed, keep moving the left pointer until it reaches the right pointer. Once this happens, the left pointer is now fixed.
Throughout these steps, check whether or not the two nodes being pointed at are distant and compute the LCP of the two suffixes being considered. This algorithm is guaranteed to find the answer. The proof of this is left to the reader.
Time Complexity:
Comments
There is another solution with hash that is very easy.
I used the same. The hash+ Binary search solution was a lot more simple and easy.
There is another relatively simple solution with suffix automaton.
radoslav11 Can you explain your solution please?
The idea is that when we have two strings, we can easily find their LCS with suffix automaton (look at the last paragraph of https://e-maxx.ru/algo/suffix_automata). For more than 2 strings you can merge them like S1 $ S2 $ ... $ Sk-1 $ Sk and preform the same thing as said in e-maxx.
Now to solve the problem on tree we will use DSU on tree. The idea is to consider the subtrees of the children of every vertex, and pick two vertices from different subtrees with largest LCS. Here we apply "dsu on tree" with maintaining a suffix automaton of the visited strings.
In my solution I've used a generalized suffix automaton as a replacement of "S1 $ S2 $ ... $ Sk-1 $ Sk" but both will result in time complexity. I'm not sure if it's against the rules or not to post a link to my submission (If it is I'll remove it) but you can find it here.
Thank you.