Editorial for New Year's '18 P7 - Tree of Life


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: r3mark

For the first subtask, the longest common substring of two DNA sequences is simply the minimum of their lengths. Note that two nodes A and B are distant if and only if there is a node with two distinct children such that A is in the subtree of one of the children and B is in the subtree of the other. We can use simple tree DP to determine the answer.

Time Complexity: \mathcal O(N)

For the second subtask, there were several different solutions. The majority of them were based around binary searching for the answer. This editorial will outline a different approach.

Consider the suffix array of the concatenation of the N strings. We use the following algorithm:

Keep a left pointer and a right pointer. There are two states: either the left pointer is fixed or the right pointer is fixed. Initially, the left pointer is fixed.

When the left pointer is fixed, keep moving the right pointer until the node corresponding to the suffix at the right pointer is not an ancestor of the node corresponding to the suffix at the left pointer. Once this happens, the right pointer is now fixed.

When the right pointer is fixed, keep moving the left pointer until it reaches the right pointer. Once this happens, the left pointer is now fixed.

Throughout these steps, check whether or not the two nodes being pointed at are distant and compute the LCP of the two suffixes being considered. This algorithm is guaranteed to find the answer. The proof of this is left to the reader.

Time Complexity: \mathcal O(S+N \log N)


Comments


  • 1
    hhzzkk  commented on Jan. 3, 2018, 11:21 a.m. edited

    There is another solution with hash that is very easy.


    • 1
      triple1  commented on Jan. 3, 2018, 11:27 a.m.

      I used the same. The hash+ Binary search solution was a lot more simple and easy.


  • 0
    radoslav11  commented on Jan. 3, 2018, 7:30 a.m.

    There is another relatively simple solution with suffix automaton.


    • 0
      magieNoire  commented on Jan. 3, 2018, 11:10 a.m.

      radoslav11 Can you explain your solution please?


      • 2
        radoslav11  commented on Jan. 3, 2018, 12:01 p.m. edited

        The idea is that when we have two strings, we can easily find their LCS with suffix automaton (look at the last paragraph of https://e-maxx.ru/algo/suffix_automata). For more than 2 strings you can merge them like S1 $ S2 $ ... $ Sk-1 $ Sk and preform the same thing as said in e-maxx.

        Now to solve the problem on tree we will use DSU on tree. The idea is to consider the subtrees of the children of every vertex, and pick two vertices from different subtrees with largest LCS. Here we apply "dsu on tree" with maintaining a suffix automaton of the visited strings.

        In my solution I've used a generalized suffix automaton as a replacement of "S1 $ S2 $ ... $ Sk-1 $ Sk" but both will result in \mathcal{O}(S \log S + N) time complexity. I'm not sure if it's against the rules or not to post a link to my submission (If it is I'll remove it) but you can find it here.


        • 0
          magieNoire  commented on Jan. 4, 2018, 9:34 a.m.

          Thank you.