Editorial for ECOO '12 R1 P2 - Decoding DNA


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

This problem is based on some of the actual processes of RNA Transcription, but everything is simplified. The only tricky part here is finding the terminator sequence. It gets easier if you notice that you only have to look for pairs of sequences of length 6, since longer repeated sequences will also contain a pair of length 6 sequences.

Recommended Approach

To find the start of the transcription unit, just search for TATAAT using the built-in string functions in whatever language you are using, and add 10 to the index where you first find it. To find the terminator sequence, one approach is to start at the beginning of the transcription unit and look at each sequence of 6 bases in turn. For each sequence, reverse it and compute its complement, then search for this reverse complement starting after the original 6 base sequence. As soon as you find a match, you have the end of the sequence.

Possible pitfalls include:

  1. Terminator-like sequences that occur before the start of the transcription unit.
  2. Palindromic or partially palindromic sequences like ATCGAT – its reverse complement is also ATCGAT.
  3. Zero length transcription units.

Comments

There are no comments at the moment.