Editorial for DMOPC '14 Contest 1 P6 - Biology Homework


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The first part to solving this problem is to determine how a genotype with over two alleles self-fertilizes. This means determining the genotypes of all the offspring.

For simplicity, we will refer to a genotype as a sequence, to a genotype of length smaller than k as a subsequence, to an offspring genotype as a resulting sequence, and to the inputted genotype as the initial sequence.

We will obtain all possible permutations (with repetition, order does not matter) of the sequence.

For example, if 123 is the given sequence, the resulting set of sequences is: 111, 112, 113, 122, 123, 133, 222, 223, 233, and 333.

The procedure is as follows:

  • Generate all sequences of length k where each term is one of the first k numbers. There are k^k sequences.
  • Keep the sequences that are sorted.

The resulting set of sequences will be represented by the set S. Now that we have a list of all the possible sequences, we can run a simulation up to the n^{th} generation.

Method 1: Simulation

Create an array of dimensions |S| \times |S| with the rows representing each sequence and with the columns representing the frequency of each resulting sequence.

For each of the possible sequences, determine all the resulting set of sequences and their frequencies using the procedure described above (for example, for sequence 12, the frequency of 12 is 2, and is 1 for both 11 and 22).

Then, for each generation update an array that stores the number of times each sequence appears in the current generation. The frequency value for each can be divided by the total number of sequences in that generation to yield the probability of it occurring in that generation.

Method 2: Markov Chains

It should be noticed that the problem satisfies the three conditions for a Markov Process:

  • There is a fixed number of states; we can determine all possible sequences.
  • Each sequence generated after any generation is part of that fixed number of states.
  • The state at a generation g depends only on the state of g-1, and not on the preceding ones.

Markov Chains allow us to obtain the state of a matrix after n generations. You can read about them here: https://chance.dartmouth.edu/teaching_aids/books_articles/probability_book/Chapter11.pdf.

The first step is to create the transition matrix, which will be of size |S| \times |S|. The value (r,c) represents the ratio of the frequency of sequence r to the number of resulting sequences of c.

Once the transition matrix is generated, all that is left is to multiply the initial state by the transition matrix raised to the desired generation.

This requires implementing matrix multiplication and is left as an exercise to the reader.


Comments


  • 0
    Riolku  commented on Dec. 19, 2018, 3:12 p.m.

    Is the lowercase n here different than the N specified in the problem? If not, how can we iterate through 10^{18} generations?