Editorial for DMOPC '20 Contest 2 P5 - Majority Subarrays
Submitting an official solution before solving the problem yourself is a bannable offence.
For subtask 1, we can loop over all subarrays and check whether it has a majority element by looping over all elements and counting the number of times it occurs, which is for each subarray.
For subtask 2, we can speed up counting the number of times an element occurs by precomputing a 2 dimensional table of how many times each number occurs in each prefix of the array, allowing us to query the number of occurrences of a number in some subarray in .
For subtask 3, there are several ways to further improve the complexity. We describe two of them below.
One approach is to use randomization to check whether a subarray has a majority element. Note that if we randomly choose some element of the subarray, there is a chance or better that it is a majority element, if one exists. Thus after precomputing the occurrence table we can check some number (say ) random elements in each subarray in order to determine whether it definitely has a majority element or very likely does not. If we pick , the probability that no subarray is incorrectly determined to not have a majority element is at least for , which is good enough.
Another approach, which will help us in further subtasks, is to note that each subarray has at most one majority element, so it suffices to count for each number between and the number of subarrays where that number occurs more than half the time. For each number , consider the array where if and if . We need to find the number of subarrays of with a positive sum, which can be done by taking the prefix sum array of and using a binary indexed tree while looping over the array to help count the number of previously seen elements that are less than the current element.
For subtask 4, there are several ways to further improve the complexity. We describe a few of them below.
One easy method is to consider all left endpoints and loop through all right endpoints while maintaining a running frequency table and a maximum frequency value. We can then compare this with the length of the current subarray to determine whether it has a majority element.
Another approach is to modify the standard linear-time majority checking algorithm to get a unique candidate for the majority element for each subarray, and use the precomputed occurrence table to check whether that candidate is indeed a majority element. Note that the occurrence table needs to store integers which takes about megabytes of memory and fits in the memory limit.
Another way is to optimize the second approach in subtask 3. Note that consecutive updated/queried values differ by at most 1. It suffices to maintain a simple array of the number of times each prefix sum is seen so far, as well as the current prefix sum and the answer to the last query. If
s is the prefix sum and
p is the last answer, then we can do
p += seen[s],s++ if the current value is and do
s--,p -= seen[s] if the current value is , add
p to a running answer, and add 1 to
For subtask 5, we need to do better than the quadratic time barrier from either looping through all subarrays or looping through the entire array for each number.
One approach is to optimize counting majority subarrays for numbers that don't occur often. If a number occurs times in the array, there are possibilities for which of these numbers are part of the subarray, and we can do some math to calculate for each possibility the number of subarrays with that number as a majority element. Combined with the approach for subtask 4, we can consider some number that occurs times in time, which is overall with a good constant. Other square root decomposition ideas are possible.
Another approach is to note that there are many long runs of s in the array introduced above. We can take advantage of this to simulate the subtask 4 algorithm quickly using a lazy segment tree that supports adding an arithmetic sequence to a range and range sum queries in time for a number that occurs times. This gives an algorithm overall that unfortunately has a very bad constant but still passes subtask 5 easily.
Time complexity: or
For subtask 6, we need an algorithm.
Let's try to process some number occurring times in . We need a better way to take advantage of the long runs of s in the array (recall that if and if ). The key is that the long runs of s separate the array into small sections that cannot interfere with each other, as "crossing" a long run of s will result in the subarray becoming "too negative" for that number to occur enough times. Here is one way to formalize this: Consider a number line and with a pen at initially. Loop through the array and move the pen to the right by each time we see a and move the pen to the left by each time we see a , drawing a "trail" of arrows of length 1 as we go. Every left arrow that does not overlap or touch a right arrow can be effectively ignored as they do not change the answer. All other left arrows can be paired with the closest right arrow by drawing time that covers or touches it (possibly drawn before or after this left arrow is drawn). On the other hand, there are only right arrows, each of which can only "activate" 4 left arrows each. Thus there are at most around important arrows we need to consider, and all other arrows can be effectively ignored. With a proper implementation, this gives the desired algorithm overall.