Editorial for ECOO '12 R1 P2 - Decoding DNA
Submitting an official solution before solving the problem yourself is a bannable offence.
This problem is based on some of the actual processes of RNA Transcription, but everything is simplified. The only tricky part here is finding the terminator sequence. It gets easier if you notice that you only have to look for pairs of sequences of length , since longer repeated sequences will also contain a pair of length sequences.
Recommended Approach
To find the start of the transcription unit, just search for TATAAT
using the built-in string functions in whatever language you are using, and add to the index where you first find it. To find the terminator sequence, one approach is to start at the beginning of the transcription unit and look at each sequence of bases in turn. For each sequence, reverse it and compute its complement, then search for this reverse complement starting after the original base sequence. As soon as you find a match, you have the end of the sequence.
Possible pitfalls include:
- Terminator-like sequences that occur before the start of the transcription unit.
- Palindromic or partially palindromic sequences like
ATCGAT
– its reverse complement is alsoATCGAT
. - Zero length transcription units.
Comments