Editorial for DMOPC '20 Contest 2 P5 - Majority Subarrays

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: zhouzixiang2004

Subtask 1

For subtask 1, we can loop over all $\mathcal{O}(N^2)$ $O (N^{2})$ subarrays and check whether it has a majority element by looping over all elements and counting the number of times it occurs, which is $\mathcal{O}(N^2)$ $O (N^{2})$ for each subarray.

Time complexity: $\mathcal{O}(N^4)$ $O (N^{4})$

Subtask 2

For subtask 2, we can speed up counting the number of times an element occurs by precomputing a 2 dimensional table of how many times each number occurs in each prefix of the array, allowing us to query the number of occurrences of a number in some subarray in $\mathcal{O}(1)$ $O (1)$ .

Time complexity: $\mathcal{O}(N^3)$ $O (N^{3})$

Subtask 3

For subtask 3, there are several ways to further improve the complexity. We describe two of them below.

One approach is to use randomization to check whether a subarray has a majority element. Note that if we randomly choose some element of the subarray, there is a $\frac{1}{2}$ $\frac{1}{2}$ chance or better that it is a majority element, if one exists. Thus after precomputing the occurrence table we can check some number (say $T$ $T$ ) random elements in each subarray in order to determine whether it definitely has a majority element or very likely does not. If we pick $T = 30$ $T = 30$ , the probability that no subarray is incorrectly determined to not have a majority element is at least $(1-2^{-30})^\frac{N(N+1)}{2} > 0.998$ $(1 - 2^{- 30})^{\frac{N (N + 1)}{2}} > 0.998$ for $N \le 2\,000$ $N \leq 2 000$ , which is good enough.

Another approach, which will help us in further subtasks, is to note that each subarray has at most one majority element, so it suffices to count for each number between $1$ $1$ and $N$ $N$ the number of subarrays where that number occurs more than half the time. For each number $x$ $x$ , consider the array $b$ $b$ where $b_i = -1$ $b_{i} = - 1$ if $a_i \ne x$ $a_{i} \neq x$ and $b_i = 1$ $b_{i} = 1$ if $a_i = x$ $a_{i} = x$ . We need to find the number of subarrays of $b$ $b$ with a positive sum, which can be done by taking the prefix sum array of $b$ $b$ and using a binary indexed tree while looping over the array to help count the number of previously seen elements that are less than the current element.

Time complexity: $\mathcal{O}(N^2 \log N)$ $O (N^{2} \log N)$

Subtask 4

For subtask 4, there are several ways to further improve the complexity. We describe a few of them below.

One easy method is to consider all left endpoints and loop through all right endpoints while maintaining a running frequency table and a maximum frequency value. We can then compare this with the length of the current subarray to determine whether it has a majority element.

Another approach is to modify the standard linear-time majority checking algorithm to get a unique candidate for the majority element for each subarray, and use the precomputed occurrence table to check whether that candidate is indeed a majority element. Note that the occurrence table needs to store $N^2$ $N^{2}$ integers which takes about $381$ $381$ megabytes of memory and fits in the memory limit.

Another way is to optimize the second approach in subtask 3. Note that consecutive updated/queried values differ by at most 1. It suffices to maintain a simple array of the number of times each prefix sum is seen so far, as well as the current prefix sum and the answer to the last query. If s is the prefix sum and p is the last answer, then we can do p += seen[s],s++ if the current $b$ $b$ value is $1$ $1$ and do s--,p -= seen[s] if the current $b$ $b$ value is $-1$ $- 1$ , add p to a running answer, and add 1 to seen[s].

Time complexity: $\mathcal{O}(N^2)$ $O (N^{2})$

Subtask 5

For subtask 5, we need to do better than the quadratic time barrier from either looping through all subarrays or looping through the entire array for each number.

One approach is to optimize counting majority subarrays for numbers that don't occur often. If a number occurs $y$ $y$ times in the array, there are $\mathcal{O}(y^2)$ $O (y^{2})$ possibilities for which of these numbers are part of the subarray, and we can do some math to calculate for each possibility the number of subarrays with that number as a majority element. Combined with the $\mathcal{O}(N)$ $O (N)$ approach for subtask 4, we can consider some number that occurs $y$ $y$ times in $\mathcal{O}(\min(y^2,N))$ $O (min (y^{2}, N))$ time, which is $\mathcal{O}(N \sqrt{N})$ $O (N \sqrt{N})$ overall with a good constant. Other square root decomposition ideas are possible.

Another approach is to note that there are many long runs of $-1$ $- 1$ s in the $b$ $b$ array introduced above. We can take advantage of this to simulate the subtask 4 algorithm quickly using a lazy segment tree that supports adding an arithmetic sequence to a range and range sum queries in $\mathcal{O}(y \log N)$ $O (y \log N)$ time for a number that occurs $y$ $y$ times. This gives an $\mathcal{O}(N \log N)$ $O (N \log N)$ algorithm overall that unfortunately has a very bad constant but still passes subtask 5 easily.

Time complexity: $\mathcal{O}(N \sqrt{N})$ $O (N \sqrt{N})$ or $\mathcal{O}(N \log N)$ $O (N \log N)$

Subtask 6

For subtask 6, we need an $\mathcal{O}(N)$ $O (N)$ algorithm.

Let's try to process some number $x$ $x$ occurring $y$ $y$ times in $\mathcal{O}(y)$ $O (y)$ . We need a better way to take advantage of the long runs of $-1$ $- 1$ s in the $b$ $b$ array (recall that $b_i = -1$ $b_{i} = - 1$ if $a_i \ne x$ $a_{i} \neq x$ and $b_i = 1$ $b_{i} = 1$ if $a_i = x$ $a_{i} = x$ ). The key is that the long runs of $-1$ $- 1$ s separate the array into small sections that cannot interfere with each other, as "crossing" a long run of $-1$ $- 1$ s will result in the subarray becoming "too negative" for that number to occur enough times. Here is one way to formalize this: Consider a number line and with a pen at $0$ $0$ initially. Loop through the $b$ $b$ array and move the pen to the right by $1$ $1$ each time we see a $1$ $1$ and move the pen to the left by $1$ $1$ each time we see a $-1$ $- 1$ , drawing a "trail" of arrows of length 1 as we go. Every left arrow that does not overlap or touch a right arrow can be effectively ignored as they do not change the answer. All other left arrows can be paired with the closest right arrow by drawing time that covers or touches it (possibly drawn before or after this left arrow is drawn). On the other hand, there are only $y$ $y$ right arrows, each of which can only "activate" 4 left arrows each. Thus there are at most around $5y$ $5 y$ important arrows we need to consider, and all other arrows can be effectively ignored. With a proper implementation, this gives the desired $\mathcal{O}(N)$ $O (N)$ algorithm overall.

Time complexity: $\mathcal{O}(N)$ $O (N)$

Comments

2

Plasmatic commented on March 16, 2021, 2:53 a.m.

I also want to mention an $\mathcal O(N\log{N})$ $O (N \log N)$ solution with better constant that with some optimizations, passes under the time limit (my solution has a worst case of 0.895s on the official data). It uses an observation similar to the one given in the editorial (quantifying the idea of being "too negative"): For some number $x$ $x$ , let $c_i=\min{(b_1, b_2, \ldots, b_i)}$ $c_{i} = min (b_{1}, b_{2}, \dots, b_{i})$ . Only indices $i \ge 1$ $i \geq 1$ where $c_i = c_{i-1}$ $c_{i} = c_{i - 1}$ (assume $c_0=0$ $c_{0} = 0$ ) can contribute to the answer (let's call these indices important), and we can prove that the number of indices that can contribute to the answer is $\mathcal O(y)$ $O (y)$ for a number that occurs $y$ $y$ times (look at what happens to the values of $b_i$ $b_{i}$ and $c_i$ $c_{i}$ when you change a single $b_i$ $b_{i}$ from $-1$ $- 1$ to $1$ $1$ ). We can then count the contribution of each important index using a data structure supporting range increment and prefix sum query, which can be done using BIT.

This solution is can probably be optimized to $\mathcal O(N)$ $O (N)$ , but I haven't been able to work out the specifics.