For this task, contestants are given a message
of length
, which consists of a sequence of integers between
and
, inclusive, and they are asked to find an encoding and decoding scheme such that the encoded message
are invariant up to permutations. The encoded message should be as concise as possible. The scores will be rewarded based on the ratio of the length of the encoded message
to the length of the original message
.

There are many ways to solve this task. It will be instructive to look at the first few subtasks.
Subtask 1
For this subtask, recall that the original message
is just a sequence of two possible numbers,
or
. Assuming that the original message is indexed from
to
, one can instead choose to send the positions in the original message where all number
's are located. This technique uses at most
integers to send the message.
Subtask 2
For subtask 2, since the numbers in the original message
could be from
to
, they can be encoded using only
bits. However, this subtask allows one to use up to
bits per number in the encoded message
. Hence, you can encode the position of each number using the higher
bits (similar to the solution to subtask 1), and use the lower
bits to encode the actual number in the original message
, i.e.

where
denotes the
number in the original message.
Note that the encoded numbers are also ordered, i.e.
, for
. Therefore, when we would like to decode the message from the shuffled array
, we can sort it to get
. Then, to obtain the original message from the sorted
should be pretty obvious.
This method also uses at most
parrots, which is satisfactory for this subtask.
Subtask 3
Thinking about the first two subtasks should get us familiar with the approach. There are many roads to take from here. However, to formalize the settings, we shall think of a general representation first.
Because we shall keep integers in the encoded array, a natural representation would be to use a sorted sequence. Note that this representation is very robust against permutation; provided that the encoded array
is sorted, by sorting a shuffled array
, we can easily recover the encoded array
, as in subtask 2.
We shall try to extend the approach in subtask 2. However, for subtasks 3–5, we do not have "extra" bits for keeping the position data. In these cases, we look at the original message as a sequence of binary digits (or bits),
and
, by representing each number using only
bits. Thus, the original message of length
will have
bits.
This leads to a way to solve subtask 3. Note that the original message
contains at most
bits. For now, let
be the
bit of the original message, indexing from
to
. In this case, it is sufficient to take
bits to represent all positions of every single bit
, and use one last bit to represent the actual data. Hence, the encoding scheme looks like the following:

therefore, it takes
bits to represent possible positions and another bit to keep the data. That is, for each bit
, we shall encode it as
. The length of the encoded message for each test case of this subtask is
bytes.
Subtask 4
For this subtask, there are
bit positions; and it requires
bits just to represent the positions. Hence, there is no way to include the information of each bit of the original message in the encoding. If we recall the technique we use to solve subtask 1, we can improve the technique we used in subtask 3 by choosing to encode the positions of all
bits. This approach again sends at most
bytes.
Subtask 5
For this subtask, contestants are allowed to obtain some partial scores, depending on the ratio of the encoding message to the original message. In our point of view, to obtain nearly perfect scores like
is relatively much easier than to obtain the full points. We consider the last
points to be the reward to those who choose to implement more sophisticated methods of encoding that achieve better results than our expectations.
We discuss many potential solutions that might achieve some partial scores.
A
approach
We are going to generalize the solution to the previous subtask case since the number of bits is more than
. In this case, there are at most
bit positions.
To deal with this issue, we partition the message into
pairs of bits. We define the
bit pair
to be
. Again, the bit pair
should be indexed from
to
.
Note that there are
possible values for each bit pair, which can be represented with integers from
to
. One possible method is to send the positions of the bit pair (which uses
bits to encode) in the encoding data, and then the actual bit pair data will be encoded as the number of occurrences of such positions.
Using this approach, for each pair of bits, we may have to send at most
bytes. Therefore, we send at most
bytes. Contestants will achieve
points on this subtask for implementing this method.
A
approach
With a simple observation, we can reduce the size of the encoded messages by roughly a factor of two. Note that the best case (in terms of encoding length) for the previous encoding method is when the original message only contains zero bits, and the worst case is when the message contains only one bits. We shall try to get the average of these two cases.
Consider two encoding schemes, called ONE and ZERO. For a pair of bits
, the ONE scheme sends
copies of
, but the ZERO scheme sends
copies. Note that both schemes work, provided that the decoder knows which of the schemes the encoder actually uses.
For each pair of bits, the sum of the number of encoded bytes used by both schemes is always
bytes. Considering all pairs, the sum is
. Now, if we choose the scheme with the lower number of encoding data, we will need at most half of this number, i.e., at most
bytes.
There are many ways to signal the decoder which encoding scheme the encoder uses without too much penalty on the ratio. For example, we can add
copies of
iff the ZERO scheme is used. This will make the ratio go up by
. However, since
, the worst ratio is at most
. Contestants will achieve
points on this subtask for implementing this method.
Better approaches
There are many other approaches that give a better compression ratio.
One byte to 4 numbers
Consider the number of encoded messages of length no greater than
, only consisting of numbers
,
,
, and
(
is one such message). To analyze this, it might be easier to consider an equivalent scheme: encoded messages of length exactly
, only consisting of numbers
,
,
,
, and BLANK. There are
of them, which is enough to encode one byte of the original message.
Since the original message is at most
bytes long, it is possible to allocate
numbers per byte (
,
,
, and
for the first byte, for example), and therefore numbers from
to
are sufficient. This scheme yields a compression ratio of at most
.
Optimal ratio
Theoretically, one can encode
bytes of data to the minimum of
bytes of encoded message without loss of information. There are
different encoded messages using no more than
bytes. This is just more than
, the number of different
-byte original messages. The compression ratio achieved is about
. For smaller data, the ratio is even smaller.
This method yields a perfect score for this task. However, implementation of this scheme is comparatively difficult.
Comments