Editorial for IOI '04 P6 - Empodia


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Signed permutations are important in the study of DNA sequences [Ber01].

For each framed interval the following hold:

Fact 1: The starting (F_s) and finishing (F_f) point of a framed interval F are related with the following equation:

\displaystyle Value(F_f) = Value(F_s)+length(F)-1 \tag{1}

Fact 2: All values in the interior points of a framed interval are distinct and cover the entire interval [i+1, i+k-1] exactly once.

Simple

A simple algorithm is the following. For each element i of the sequence, one exhaustively seeks for framed intervals of length l (2 \le l \le |\sigma|-i+1) starting at element i, where |\sigma| = N+2 is the number of elements contained in the sequence \sigma. For each pair (i, j) the corresponding length induces, someone inspects whether Fact 1 holds or not. If it does, then according to Fact 2 one must verify that all elements that lie in between are proper so that this interval is in fact a framed interval. We have \mathcal O(n^2) pairs (i, j), and for each interval these pairs induce, we require \mathcal O(n) time to verify that this is a framed interval whenever Fact 1 holds. Thus, this part of the algorithm takes \mathcal O(n^3) time. During this part of the algorithm one keeps track of candidate hurdles starting at index i. Of course there is no need to keep track all of \mathcal O(n) framed intervals that might exist starting at index i, since all framed intervals that don't have minimum length can not be hurdles.

Moreover, according to our definition of hurdles, the algorithm requires another part so as to eliminate all those framed intervals that contain shorter framed intervals starting at different points. At this final part of the algorithm we have \mathcal O(n) candidate hurdles. For each one of them one can verify in \mathcal O(n) time if another candidate hurdle is contained and decide whether this candidate hurdle is really a hurdle or not. This step of the algorithm takes \mathcal O(n^2) time.

As a result, the entire algorithm takes \mathcal O(n^3) time.

Clever

Fact 3: The element with the minimum value in a hurdle is always the starting point of the hurdle.

Fact 4: The element with the maximum value in a hurdle is always the finishing point of the hurdle.

Crucial observation: An important observation for the proposed algorithm is stated by the following lemma:

Lemma 2.1 The endpoint(s) of a hurdle can not be in the interior of another hurdle.

Proof

  • Case 1: Both endpoints of a hurdle lie in the interior of another hurdle. Then obviously according to definition, one of the above framed intervals can not be a hurdle, which is a contradiction.

  • Case 2: Only one endpoint of a hurdle lies in the interior of another hurdle.

    Suppose for the purpose of contradiction that a hurdle H_2 has its starting point k in the interior of another hurdle H_1, a situation like the one shown in Figure 1. Then, based on the preceding facts we have:

    \displaystyle Value(i) < Value(k) < Value(j) < Value(l) \tag{2}

    since the k^\text{th} element is an interior point of hurdle H_1 and j^\text{th} element is an interior point of hurdle H_2. Now pick an element with index x at random such that k < x < j. Obviously, this element belongs to hurdle H_1, so Value(x) < Value(j). Moreover, x belongs to hurdle H_2, so Value(k) < Value(x). In other words, every element with index x, such that k < x < j, implies that:

    \displaystyle Value(k) < Value(x) < Value(j) \tag{3}

    Finally, there is no other element y that implies (3) outside of the interval [k+1, j-1], since if it were, then either hurdle H_1 would not be a hurdle, or H_2. As a result, all intermediate values lie in this interval (shown with red line) and as a consequence we have a framed interval. But this is a contradiction to our assumption that intervals [i, j] and [k, l] form two different hurdles. This completes the proof.

Note: In fact, in the above case we can prove something even stronger. Since all elements in the interval [k+1, j-1] imply equation (3), all elements in the interval [i, k] (blue line) imply that this region is also a framed interval since they belong to framed interval [i, j]. Finally, all the elements in the interval [j, l] (yellow line) imply that this region is also a framed interval, since they belong to framed interval [k, l].

So, a better way to handle this problem is to find a "most suitable" candidate framed interval for each element i. This can be achieved by scanning sequentially the elements of the given sequence from left to right and at the same time remembering the maximum value (max) found so far. Now, if someone finds an element x that has value Value(x) = max+1 and at the same time Fact 1 is satisfied, it is easy to see that this implies the end of a framed interval. Since we have \mathcal O(n) elements to check and for each one of them we have \mathcal O(n) elements to scan, it is straightforward that in order to find all candidate framed intervals takes \mathcal O(n^2) time. On the following we present the algorithm in pseudocode assuming that the elements of the sequence are given in array SEQ. The algorithm returns array FRAMED, where it is stored the length of a candidate hurdle starting at position i.

Clever_Candidate_Enumeration(SEQ[])
  for i ← 0 to N + 1
    FRAMED[i] ← nil
  for i ← 0 to N
    max ← SEQ[i]
    for k ← 1 to N + 1 - i
      if ((SEQ[i + k] = SEQ[i] + k) AND (SEQ[i + k] = max + 1))
        FRAMED[i] ← k + 1
      else if (SEQ[i + k] < SEQ[i])
        break
      else if (SEQ[i + k] > max)
        max ← SEQ[i + k]
  return FRAMED

As in the previous algorithm, we can perform the elimination of false alerts in \mathcal O(n^2) time.

Elimination of false alerts in linear time

Based on the preceding lemma one can easily decide which of the candidate hurdles are in fact hurdles and not false alerts in linear time. One must keep track of the length of each candidate hurdle at its starting position in an array FRAMED. We say that a candidate hurdle is active if we examine elements in the interior of the hurdle.

The idea is the following. One scans sequentially the array FRAMED, and if he encounters a starting position of a candidate hurdle H_2 while a candidate hurdle H_1 is active, then case 1 of lemma 2.1 must apply because the candidate hurdle H_1 is a framed interval of minimum length starting at the starting position of candidate hurdle H_1. This process requires no more than N+1 queries on array FRAMED and as a result its running time is \mathcal O(n). The elimination algorithm is given with the following pseudocode:

Linear_Elimination(FRAMED[])
  i ← 0
  while (i < N + 1)
    if (FRAMED[i] = nil)
      k ← i + 1
      while (k < FRAMED[i] + i - 1)
        if (FRAMED[k] = nil)
          FRAMED[i] ← nil
          break
        else k ← k + 1
      i ← k - 1
    else i ← i + 1
  return FRAMED

Finally, either way we choose to eliminate false alerts, this algorithm has running time \mathcal O(n^2).

Advanced

This problem can be solved in superlinear \mathcal O(n \alpha(n)) [BH96] and linear (\mathcal O(n)) [BMY01] time if someone uses graphs. These results come from one of the most exciting areas in computer science in the last decade which was pioneered by Hannenhalli and Pevzner [HP95] and have rather simple implementations.

A detailed treatment of the field that inspired this problem can be found at [Sie01].

References

[Ber01] Anne Bergeron. A Very Elementary Presentation of the Hannenhalli-Pevzner Theory. CPM 2001 - Lecture Notes in Computer Science, 2089:106–117, April 19 2001.

[BH96] Piotr Berman and Sridhar Hannenhalli. Fast Sorting by Reversal. Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, pages 168–185, 1996.

[BMY01] Bader, Moret, and Yan. A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study. In Lecture Notes in Computer Science, editor, WADS: 7th Workshop on Algorithms and Data Structures, volume 2125, pages 365–376, 2001.

[HP95] Sridhar Hannenhalli and Pavel Pevzner. Transforming Cabbage into Turnip. Proceedings of the 27th Annual Symposium on Theory of Computing (STOC95), pages 178–189, 1995.

[Sie01] Adam C. Siepel. Exact Algorithms for the Reversal Median Problem. Master's thesis, University of New Mexico, December 2001.

[Sie02] Adam C. Siepel. An Algorithm to Enumerate All Sorting Reversals. RECOMB, pages 281–290, April 2002.


Comments


  • -2
    thomas_li  commented on April 7, 2024, 8:36 p.m.

    bruh