Editorial for DMOPC '21 Contest 10 P6 - Median Replace

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: zhouzixiang2004

Subtask 1

Hint 1

Characterize all the sequences of operations that contribute to the answer.

Hint 2

The answer to (1) is always $1$ if $N$ is odd and always $2$ if $N$ is even.

Hint 3

Even $N$ are equivalent to concatenating two odd-length arrays. Thus, focus on solving odd $N$ (the below hints only consider the odd $N$ case).

Hint 4

Whenever we perform an operation, we form a "block" of $\ge 2$ consecutive equal elements. If we ever form more than $1$ block of $\ge 2$ consecutive equal elements, it is impossible to ever "merge" them.

Hint 5

We must start a block (say, with median $m$ ) somewhere in the middle of the array, then repeatedly extend it to the left or the right. This extending is only possible if there are an equal number of elements less than $m$ and greater than $m$ .

For example, for $[1,4,5,7,6,2,3]$ , we can start a block like $[4,4,4,7,6,2,3]$ and then extend it to $[4,4,4,4,4,4,4]$ (by choosing $[l,r] = [3,7]$ ) because $7$ and $6$ are greater than $4$ while $2$ and $3$ are less than $4$ .

Hint 6

The initial block must contain the median of the entire array, say $m$ .

Hint 7

Consider a new array $b$ where $b_i = -1$ if $a_i < m$ , $b_i = 0$ if $a_i = m$ , and $b_i = 1$ if $a_i > m$ . The answer to (2) depends on the number of prefix sums before the position of $m$ in $a$ that are $0$ , as well as the number of prefix sums after the position of $m$ in $a$ that are $0$ . The answer to (3) can be calculated by a binomial coefficient.

Solution

The first step is to characterize all the sequences of operations that contribute to the answer. It turns out that for odd $N$ , these sequences are as follows:

Let $m$ be the median of the array, and define $b_i = -1$ if $a_i < m$ , $b_i = 0$ if $a_i = m$ , and $b_i = 1$ if $a_i > m$ for $1 \le i \le N$ . Let's draw a "separator" at positions where the prefix sum of $b$ is $0$ . For example, 1 4 5 7 6 2 3 becomes |1 4 5|7 6 2 3| and 2 7 1 4 5 6 3 becomes |2 7|1 4 5|6 3|. Then the first operation must "fill" the subarray containing the median $m$ (e.g. |2 7|1 4 5|6 3| becomes |2 7|4 4 4|6 3|), unless this subarray only contains $m$ (e.g. |2 7|4|1 5|6 3|), in which case we already start extending to the left or the right as described next. Then, we repeatedly extend this block to the left or the right until we hit the next separator.

If there are $a$ separators to the left of $m$ and $b$ separators to the right of $m$ , the number of operations is $a+b-2$ if $m$ is in a single element subarray and $a+b-1$ otherwise. The number of ways to perform these operations is the number of ways to arrange $a-1$ "left" instructions and $b-1$ "right" instructions, which is $\binom{a+b-2}{a-1}$ . This allows us to solve odd $N$ in $\mathcal{O}(N)$ (using a linear time median algorithm) or $\mathcal{O}(N \log N)$ time (using sorting to find the median).

For even $N$ , consider splitting the array into two odd-length arrays in one of the $N/2$ ways. We must perform a maximal sequence of operations in both subarrays and then interleave the operations (multiplying the number of ways by another binomial coefficient). The final answer to (3) is the sum of these counts over all the maximum possible total number of operations.

We can consider all these splits independently, solving even $N$ in $\mathcal{O}(N^2)$ or $\mathcal{O}(N^2 \log N)$ time.

Proof Sketch

Consider a related problem where we initially have a string of length $N$ with distinct characters. We may repeatedly replace any substring of length $3$ , having all distinct characters, with $3$ copies of one of these characters. We claim that for even $N$ , it is impossible to make all characters equal. For both odd and even $N$ , we will characterize all sequences of operations that result in the minimum possible number of distinct remaining characters.

Note that at any time, the occurrences of some character will form a contiguous substring. Let's call a maximal substring of equal characters a block, but only if it has a length of at least $2$ , and let's call any character that occurs only once a singleton.

The key invariant is that there can never be a substring composed of (a singleton or the left border) + (several blocks of even length) + (a singleton or the right border) (e.g. abbccccdddde). To prove this, consider the first time such a pattern occurs. Working backwards, the cases for the previous operation are:

abbxyccdddde -> abbccccdddde with $x \ne y, y \ne c$ ~x \ne y, y \ne c~. Then yccdddde has this pattern. Note that this contains the $b = x$ ~b = x~ case.
Same thing but on the other side (abbccyzdddde -> abbccccdddde).
abbccccddddeeyz -> abbccccddddefff with $e \ne y, y \ne z$ ~e \ne y, y \ne z~. Then abbccccddddeey has this pattern.
Same thing but on the left side.
Other operations do not interact with this pattern.

In all cases, we find that the pattern must have also existed before the previous operation, a contradiction.

Now we can use this invariant, as well as the invariant that equal characters are contiguous, to work backwards from the end state. For odd $N$ , we must repeatedly shrink the main block by $2$ on some end to reveal two characters we haven't seen before (for if not, then the only possibility is like ...aaaaabc -> ...aaaxbbc with $x \ne a, x \ne b$ . But then xbbc contains this impossible pattern). For even $N$ , it is impossible to make all characters equal since the entire string would form the impossible pattern, so the end state must be composed of two odd-length equal character strings concatenated. A similar argument shows that we must repeatedly shrink one of the two main blocks by $2$ on some end to reveal two characters we haven't seen before (an operation like aaaaaacbbbbb -> aaaaabbbbbbb or similar is not possible since aaaaaac is impossible).

Returning to the original problem, note that we can simulate an operation on a subarray of size $> 3$ with several operations on size $3$ subarrays if we are allowed to choose any element to replace with instead of only the median (e.g. we can do 23145 -> 33333 by 23145 -> 33345 -> 33333), so the process is strictly less complex than the problem above. The characterization of the original problem readily follows.

Time Complexity: $\mathcal{O}(N^2)$ or $\mathcal{O}(N^2 \log N)$

Subtask 2

Hint

Use square root decomposition to optimize.

Solution

From now on, we focus on optimizing the even $N$ case.

First, we can compute the median of every odd length prefix and suffix, e.g. by maintaining two sorted sets, one for the elements less than the median and one for the elements greater than the median, and moving elements between these sets when one becomes larger than the other. Then, we can sweep through these prefixes and suffixes in increasing order of their median while maintaining the $b$ array described above in a data structure. We need to support single index updates and querying the number of prefix sums that are $0$ in a range. This can be done using square root decomposition: Split the array into $\mathcal{O}(\sqrt{N})$ blocks of size $\mathcal{O}(\sqrt{N})$ . In each block, maintain its sum and an array of size $\mathcal{O}(\sqrt{N})$ , the number of occurrences of each prefix sum within the block (note that elements of $b$ are always in $[-1,1]$ , so there are only $\mathcal{O}(\sqrt{N})$ possibilities). For each update, recompute the information in the block that contains the updated index. For queries, use brute force on the $\mathcal{O}(\sqrt{N})$ size ends of the range that do not fully contain a block, and loop through the $\mathcal{O}(\sqrt{N})$ fully contained blocks, using the precomputed information to accumulate the answer.

Another similar approach sweeps in order of the size of the prefix/suffix instead of its median, using the observation that information about which elements are less than or greater than the median changes in only $\mathcal{O}(1)$ places each time.

Time Complexity: $\mathcal{O}(N \sqrt N)$

Subtask 3

Hint 1

Try to make a few more observations.

Hint 2

For a prefix $a[1..i]$ where $a_x$ is the median, an index $j$ to the right of $x$ contributes to the answer if the median of $a[1..j]$ is equal to $a_x$ . What about indices to the left of $x$ ?

Hint 3

For a prefix $a[1..i]$ where $a_x$ is the median, an index $j$ to the left of $x$ contributes to the answer if $a_x$ is between the two medians of $a[1..j]$ .

Solution

To get an $\mathcal{O}(N)$ or $\mathcal{O}(N \log N)$ time solution, observe that for a prefix $a[1..i]$ of odd length where $a_x$ is the median,

An index $j$ ~j~ to the right of $x$ ~x~ contributes to the answer if the size of $a[1..j]$ ~a[1..j]~ is odd and its median is equal to $a_x$ ~a_x~.
An index $j$ ~j~ to the left of $x$ ~x~ contributes to the answer if the size of $a[1..j]$ ~a[1..j]~ is even and $a_x$ ~a_x~ is between its two medians.
We can check whether $a_x$ ~a_x~ is a single element block by checking if the median of $a[1..x]$ ~a[1..x]~ is equal to $a_x$ ~a_x~.

Using these observations, we can sweep through the prefixes in increasing order while maintaining a frequency array of medians seen so far to handle case 1 and a binary indexed tree to handle case 2. Doing a similar thing for suffixes yields an $\mathcal{O}(N \log N)$ solution.

However, it is actually possible to optimize this solution to $\mathcal{O}(N)$ ! First, we can find the median(s) of each prefix and suffix in linear time by deleting elements one by one, maintaining a linked list of remaining elements and a pointer to the middle of the list. Then, note that if we consider the range $[\text{lower median}+1,\text{upper median}-1]$ , then this range for any index $j$ to the right of $x$ will not contain $a_x$ , as this range does not contain any elements of $a$ to the left of $j$ . Thus we can use a difference array instead of a BIT to update all of these ranges at once, optimizing the last part to $\mathcal{O}(N)$ too.

Time Complexity: $\mathcal{O}(N)$ or $\mathcal{O}(N \log N)$ .

Comments

There are no comments at the moment.