Editorial for Baltic OI '07 P3 - Sound - DMOJ: Modern Online Judge

Editorial for Baltic OI '07 P3 - Sound

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

There are quite a few possible solutions to this task.

The simplest one is to keep a "sliding window" buffer of the last $m$ $m$ values seen and scan it after each new sample to determine the lowest and the highest value in the buffer. This obviously runs in $\mathcal O(nm)$ $O (n m)$ time, which is too slow for any larger test cases. Such a solution is expected to score about $40\%$ $40 %$ .

An easy improvement over the naïve solution is to cache the minimum and maximum and only scan the full buffer when a current extreme value is pushed out. While this still runs in $\mathcal O(nm)$ $O (n m)$ time in the worst case, it is much faster in many cases. Such a solution is expected to score about $50\%$ $50 %$ , as the larger test cases are engineered against this heuristic.

An asymptotic improvement is to build a binary tree that keeps the last $m$ $m$ values in the leaves and the minimum and maximum of each subtree in the internal nodes. Then the global minimum and maximum can be updated in $\mathcal O(\log m)$ $O (\log m)$ steps by replacing the oldest value in the buffer with the new one and updating the minima and maxima on the path from the updated leaf to the root of the tree and the overall running time is $\mathcal O(n \log m)$ $O (n \log m)$ . Such a solution is expected to score full points.

Another solution is to keep a "last seen" index for each possible sample value and scan this table after each new sample to determine the lowest and highest value seen no more than $m$ $m$ samples ago. This obviously runs in $\mathcal O(na)$ $O (n a)$ time, where $a$ $a$ is the size of the range of sample values. Such a solution is expected to score about $20\%$ $20 %$ .

Like in the "sliding window" case, we can cache the minimal and maximal values seen recently and use this information to reduce the scanning range. This reduces the worst-case running time to $\mathcal O(\frac{na}{m})$ $O (\frac{n a}{m})$ and proves to be surprisingly efficient on average. Such a solution is expected to score about $50\%$ $50 %$ , as the larger test cases are engineered against this heuristic.

The "last seen"-based solution can also be augmented with a binary tree. This time we need to keep only the maximum of each subtree, as this is sufficient to reduce the search for the highest and lowest value seen among the last $m$ $m$ samples to $\mathcal O(\log a)$ $O (\log a)$ steps and thus the total running time to $\mathcal O(n \log a)$ $O (n \log a)$ . Such a solution is expected to score about $70\%$ $70 %$ .

Yet another solution is to keep a "number of times seen among the last $m$ $m$ " counter for each possible sample value in addition to the $m$ $m$ -element "sliding window" buffer. Using the buffer, we can update the counters in constant time when a new sample arrives and an old one retires. Using the counters and the most recent sample $a_i$ $a_{i}$ , we can check in $\mathcal O(c)$ $O (c)$ steps if the values $a_i+j-c \dots a_i+j$ $a_{i} + j - c \dots a_{i} + j$ account for all $m$ $m$ samples for some $j \in 0 \dots c$ $j \in 0 \dots c$ . The total running time of this solution is $\mathcal O(nc)$ $O (n c)$ . Such a solution is expected to score about $60\%$ $60 %$ .

An $\mathcal O(n \log n)$ $O (n \log n)$ solution could be written using a segment tree on the whole input sequence. Such a solution would be expected to score about $80\%$ $80 %$ .

An optimal solution can be developed considering the prefixes and suffixes of $m$ $m$ -element blocks of the input sequence. For each block, we can sweep the block from left to right and then from right to left to compute the minimal values for each prefix and suffix of the block, as shown in table 4.1 for the case $n = 16$ $n = 16$ , $m = 4$ $m = 4$ . An arbitrary window consists of a suffix of one block and a prefix of the following one (shown in bold in the first row of the table). The minimal value of the window can be computed in constant time by taking the minimum of the min-suffix and the min-prefix corresponding to the suffix and prefix that make up the window (shown in bold in the second and third row). The maximal value can be computed similarly, which yields a solution that runs in $\mathcal O(n)$ $O (n)$ time and $\mathcal O(n)$ $O (n)$ space. Such a solution is expected to score full points. Actually, it would be sufficient to keep only two adjacent blocks in the memory at any time, and thus reduce the memory requirement to $\mathcal O(m)$ $O (m)$ .

Another optimal solution can be derived from the observation that after encountering a value $x$ $x$ , any previously encountered value $y \le x$ $y \leq x$ will not affect computing the maximum anymore. Therefore we can replace the naïve "sliding window" with a list of indices $j_1 < j_2 < \dots < j_k$ $j_{1} < j_{2} < \dots < j_{k}$ such that $j_1 > i-m$ $j_{1} > i - m$ and $a_{j_1} > a_{j_2} > \dots > a_{j_k}$ $a_{j_{1}} > a_{j_{2}} > \dots > a_{j_{k}}$ . When a new value $a_i$ $a_{i}$ arrives, the list can be updated as follows:

If $j_1 = i-m$ $j_{1} = i - m$ , remove $j_1$ $j_{1}$ from the front of the list.
While $k > 0$ $k > 0$ and $a_{j_k} \le a_i$ $a_{j_{k}} \leq a_{i}$ , remove $j_k$ $j_{k}$ from the back of the list.
Append $i$ $i$ to the back of the list.

It should be obvious that the preceding steps guarantee that at any time the value $a_{j_1}$ $a_{j_{1}}$ is the maximum among the last $m$ $m$ samples. While the algorithm can spend $\mathcal O(m)$ $O (m)$ operations to process the arrival of a sample $a_i$ $a_{i}$ , the total cost of processing all $n$ $n$ samples can't exceed $\mathcal O(n)$ $O (n)$ , as each index enters and leaves the list only once. Maintaining the minimal value among the last $m$ $m$ samples can be done similarly, or just by keeping another list that is updated based on the value $-a_i$ $- a_{i}$ for each sample $a_i$ $a_{i}$ . Such a solution using $\mathcal O(n)$ $O (n)$ time and $\mathcal O(m)$ $O (m)$ memory is expected to score full points.

The last solution has the additional benefit that it is easily modifiable for the case when silence is defined as at least $m$ $m$ samples where the noise level does not exceed the threshold. Also, with a bit more careful programming, we could get away using only $\mathcal O(c)$ $O (c)$ memory. This could be useful when analyzing long recordings containing potentially long sections of silence on a device with limited memory.

Comments

There are no comments at the moment.