Editorial for Baltic OI '06 P5 - RLE Compression

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The optimal solution should work in $\mathcal O(n+m)$ ~\mathcal O(n+m)~.

First, we decode the input and store the sequence as a sequence of blocks. Each block represents a repetition of one character. Let the number of blocks be $B$ ~B~. It satisfies $B \le m$ ~B \le m~.

A straightforward dynamic solution leads to time and memory $\mathcal O(nm)$ ~\mathcal O(nm)~. For each $i = 0, \dots, B$ ~i = 0, \dots, B~ and each $e = 0, \dots, n-1$ ~e = 0, \dots, n-1~ we calculate a value $L(i, e)$ ~L(i, e)~ — the length of the shortest code of first $i$ ~i~ blocks such that at the end of the code the special character is set to $e$ ~e~.

This algorithm can be accelerated using the following observations. First, the code of a block of repetition of a character $a$ ~a~ using the special character other than $a$ ~a~ is not longer than the code of the block using the special character $e = a$ ~e = a~. Therefore, if $a \ne e$ ~a \ne e~ then it is always a good choice to code the block using special character $e$ ~e~, thus $L(i, e) = L(i-1, e)+C$ ~L(i, e) = L(i-1, e)+C~, where $C$ ~C~ is the length of the shortest code of block $i$ ~i~ given $e \ne a$ ~e \ne a~. So all values of $L(i, e)$ ~L(i, e)~ for $e \ne a$ ~e \ne a~ differ from $L(i-1, e)$ ~L(i-1, e)~ by the same number $C$ ~C~.

More attention is needed when calculating $L(i, a)$ ~L(i, a)~. We want to encode block $i$ ~i~ in a such way that after encoding it, the special character will be set to $a$ ~a~. This can be done in two ways: with or without switching the special character. If we don't want to switch, then the length of the code will be equal to $L(i-1, a)$ ~L(i-1, a)~ plus the length of the code of block $i$ ~i~ using $a$ ~a~ as the special character. If we want to switch the special character to $a$ ~a~ somewhere, it is always worthwhile to do it at the end of block of $a$ ~a~. This costs additional $3$ ~3~ characters. The rest of the code will contain $C$ ~C~ character plus the smallest value from the set $\{L(i-1, b) \mid b \ne a\}$ ~\{L(i-1, b) \mid b \ne a\}~. To reconstruct later the code we need to save only how this particular value $L(i, a)$ ~L(i, a)~ was calculated.

We don't have to store $L(i, \cdot)$ ~L(i, \cdot)~ for all $i$ ~i~. We need only values $L(i, \cdot)$ ~L(i, \cdot)~ for the current block.

There can be developed a fast data structure with the following operations running in constant time:

getting the value $L(i, e)$ ~L(i, e)~ for any $e$ ~e~,
calculation of $L(i, \cdot)$ ~L(i, \cdot)~ from $L(i-1, \cdot)$ ~L(i-1, \cdot)~,
getting the smallest value $L(i, e)$ ~L(i, e)~ with $e \ne a$ ~e \ne a~.

The key observation for doing it is that two values $L(i, \cdot)$ ~L(i, \cdot)~ may differ at most by $3$ ~3~. This can be proved by a simple induction on the number of blocks.

The task requires from a contestant some short insight into the problem. The more difficult part of the solution should be careful implementation.

Comments

There are no comments at the moment.