## ECOO '12 R1 P2 - Decoding DNA

View as PDF

Points: 5 (partial)
Time limit: 2.0s
Memory limit: 64M

Problem type

DNA is made up of two twisted strands that encode genes using long combinations of four bases: Adenine, Cytosine, Guanine and Thymine. The strands are complementary to one another, meaning that Adenine and Thymine are always opposite each other, and Cytosine and Guanine are always opposite each other, like this.

 A double strand of DNA: ATCAAGGCCTATTCCGGGAAAGG TAGTTCCGGATAAGGCCCTTTCC

In order for the information in a gene to be used, it has to be transcribed into a strand of RNA. During this process, a portion of one strand of DNA is transcribed – this portion is known as the transcription unit. The start of the sequence to be transcribed is signaled by a sequence of bases known as a promoter, and the end is signaled by a sequence known as the terminator. For our purposes, the promoter is the sequence TATAAT, which begins bases before the start of the transcription unit, and the terminator consists of two distinct, complementary, reversed sequences of at least length that cause the RNA molecule to coil back on itself and pinch off the transcribed strand. If TATAAT appears twice on a strand, only the first occurrence counts as the promoter. An example is shown below.

AGATTATATAATGATAGGATTTAGATTGACCCGTCATGCAAGTCCATGCATGACAGC

In this example the promoter and terminator sequences are boldfaced, and the transcription unit is underlined. The resulting RNA will be complementary to the transcription unit, except that in RNA Uracil takes the place of Thymine. For this example, the result looks like this:

CCUAAAUCUAACUGGG

The input will contain five single strands of DNA, one on each line. Write a program to output the RNA sequence that results from the transcription process. The sequences should be numbered starting at , with a colon and a single space character following the number, as shown below.

#### Sample Input

AGATTATATAATGATAGGATTTAGATTGACCCGTCATGCAAGTCCATGCATGACAGC
AGTCTTCAAGGGGATTCCCAGGTATATAATGCAGATCGCGACGAAATATCGGGCGGGATCCATACCGACCCAGCCGCCCGA
TATAATGGGGGAGAGACCGAGTGTTTAAGTCCCGAGGGATCGGGAGTGAGATTGAGGGAATTCCGGGAATCTCACT

#### Sample Output

1: CCUAAAUCUAACUGGG
2: UAGCGCUGCUUUAU
3: CUCUCUGGCUCACAAAUUC

Educational Computing Organization of Ontario - statements, test data and other materials can be found at ecoocs.org

• commented on April 15, 2022, 10:26 p.m.

might be a dumb question, but in the sample input, we are treating each line of input as separate altogether? As in line 1 and line 2 are independent of each other

• commented on March 29, 2022, 1:04 p.m.

Can I impose on someone to give me a hint. I have passed the first test but failed the other two. I looked at the editorial and I’m pretty sure I’ve got the typical problems covered. Thanks.

• commented on March 29, 2022, 5:02 p.m.

Test cases like TATAATATGCATCGATATCGAT seem to fail.

• commented on April 5, 2022, 1:26 p.m.

Thanks for your tip. Am I mistaken? It looks like that should be an empty set. If so, does it get omitted or included as an empty set…ie. say like 2:(blank).

• commented on Jan. 20, 2022, 5:34 a.m.

I still don't understand what's the condition for terminator sequence. In a given example why GTCATGCA is terminator? Could someone please explain?

• commented on Jan. 21, 2022, 12:04 p.m.

If you look at the editorial, we can see how they find the terminator sequences.

Our first sequence, GTCATGCA, first gets reversed to ACGTACTG.

Then, find its complement, which would be TGCATGAC.

This problem took me a good long time to understand too.

• commented on Jan. 30, 2022, 6:00 a.m.

So in other words: GTCATGCA > CAGTACGT(complementary to GTCATGCA) > TGCATGAC(reversed CAGTACGT), I guess.

• commented on Jan. 23, 2022, 8:47 a.m.

Thanks for the explanation, it made sense now!