Editorial for DMOPC '19 Contest 1 P6 - Bob and Binary Strings

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: george_chen

For subtask 1, it suffices to iterate over all pairs of binary strings of length $a_i$ ~a_i~ and check if they are similar. Suppose the recursive function for checking similarity has complexity $T(M)$ ~T(M)~ for comparing two strings of length $M$ ~M~. If we directly implement the procedure described in the problem statement, we get

$T(M) = M+4T(M/2)$ ~T(M) = M+4T(M/2)~
$T(M) = M^2$ ~T(M) = M^2~

Time complexity: $\mathcal{O}(N a_i^2 2^{2a_i})$ ~\mathcal{O}(N a_i^2 2^{2a_i})~

For subtask 2, observe that if $a_i$ ~a_i~ is odd, then the only way for two binary strings to be similar is if they are equal. Since each binary string can only be paired with itself, this gives a total of $2^{a_i}$ ~2^{a_i}~ pairs. Since $a_i$ ~a_i~ is quite large, we must use repeated squaring to calculate $2^{a_i} \bmod 10^9+7$ ~2^{a_i} \bmod 10^9+7~.

Time complexity: $\mathcal{O}(N \log(a_i))$ ~\mathcal{O}(N \log(a_i))~

For subtask 3, we must form the crucial observation that the strings can be split into several equivalence classes. If some string $A$ ~A~ is similar to $B$ ~B~, and string $B$ ~B~ is similar to $C$ ~C~, then we can show that $A$ ~A~ is also similar to $C$ ~C~. Therefore, all strings belonging to the same class are similar and strings belonging to different equivalence classes aren't similar. For each equivalence class, we can define the lexicographically smallest possible string that is reachable from every element in said class as the representative of the class. Now, it is a matter of assigning all $2^{a_i}$ ~2^{a_i}~ strings to the corresponding class. This can be done by recursively minimizing the left and right halves and then minimizing the concatenation of the minimized left and right halves (i.e. if the left half is larger than the right half, swap them). Our recursive procedure now has complexity,

$T(M) = 2T(M/2)$ ~T(M) = 2T(M/2)~
$T(M) = M$ ~T(M) = M~

Time complexity: $\mathcal{O}(a_i 2^{a_i})$ ~\mathcal{O}(a_i 2^{a_i})~

Approach 1:

For subtask 4, we will speed up the procedure described in subtask 3. Suppose $a_i = 2k$ ~a_i = 2k~ (if $a_i$ ~a_i~ is odd we can use the algorithm for subtask 2). Instead of generating the equivalence classes for $a_i$ ~a_i~, we will do it instead for $k$ ~k~. In the recursive procedure, notice that the classes for $a_i$ ~a_i~ will be generated by pairing up the classes for $k$ ~k~ on the left with the classes for $k$ ~k~ on the right. So we can consider a list, $b$ ~b~, which is the sizes of the equivalence classes for $k$ ~k~. Notice that when we multiply $b_i$ ~b_i~ by $b_j$ ~b_j~ where $j \le i$ ~j \le i~, we obtain the size of an equivalence class for $a_i$ ~a_i~. Be careful that when $j > i$ ~j > i~ we must add this number to the resulting class when $i$ ~i~ and $j$ ~j~ are swapped instead of creating a new class. If we square the terms and modify this procedure slightly, we will get the sum of the squares of the equivalence classes for $a_i$ ~a_i~ without actually generating the terms.

Consider pairing two terms $b_i$ ~b_i~ and $b_j$ ~b_j~ where $i \ne j$ ~i \ne j~. The equivalence class they form has size $2b_i b_j$ ~2b_i b_j~ and the square of the size would be $4b_i^2 b_j^2$ ~4b_i^2 b_j^2~. Since they will be multiplied twice by the procedure, we must multiply their product again by $2$ ~2~ in order to get the square of the resulting size to be correct.

Time complexity: $\mathcal{O}(a_i 2^{a_i/2})$ ~\mathcal{O}(a_i 2^{a_i/2})~

Approach 2:

We will generalize approach 1. Suppose $a_i = 2^m k$ ~a_i = 2^m k~. We will first generate the classes for $k$ ~k~. However, since $k$ ~k~ must be odd, there are just $2^k$ ~2^k~ groups of size 1. By using the explicit procedure we will generate the terms for $2k, 2^2 k, \dots, 2^{m-1} k$ ~2k, 2^2 k, \dots, 2^{m-1} k~. Since the size of the list squares itself each iteration, the first $m-1$ ~m-1~ iterations will use up negligible time compared to the last iteration. Once again, we can use the implicit generation method to get the answer for $2^m k$ ~2^m k~ without generating the actual terms.

Time complexity: $\mathcal{O}(2^{a_i/2})$ ~\mathcal{O}(2^{a_i/2})~

For subtask 5, we observe that there are many duplicated group sizes and that the order of the groups doesn't actually matter. Therefore, we will store the number of groups of size $i$ ~i~ instead of explicitly storing the groups. This works well if you observe that the sizes of the equivalence classes must be a power of 2. If all the terms going into a doubling operation are powers of two, then all resulting terms will also be powers of two since they must be a product of two input terms or two input terms multiplied by $2$ ~2~. With this form, note that the doubling operation we used in the previous subtask is similar to a convolution of this array. As with approach 2 from the previous subtask, if $a_i = 2^m k$ ~a_i = 2^m k~ we will start with the representation of the classes of $k$ ~k~ and double the representation until we get to $a_i$ ~a_i~. The representation doubles in size with each iteration and performing one iteration takes $\mathcal{O}(S^2)$ ~\mathcal{O}(S^2)~ where $S$ ~S~ is the size of the representation at the current iteration. Once again, the first $m-1$ ~m-1~ iterations contribute very little to the complexity which gives $\mathcal{O}(a_i^2)$ ~\mathcal{O}(a_i^2)~ per query.

Time complexity: $\mathcal{O}(N a_i^2)$ ~\mathcal{O}(N a_i^2)~

For subtask 6, we will try to find a way of directly generating the answer instead of storing the number of each term as this will quickly become unmanageable. Observe that the final form we seek is the sum of the squares of the sizes of the equivalence classes for $a_i$ ~a_i~. Let $S_d(i)$ ~S_d(i)~ be the sums of $2^i$ ~2^i~-th powers of the sizes of the groups for $d$ ~d~ so the sum of squares for $d$ ~d~ is just $S_d(1)$ ~S_d(1)~. The sum of squares of the sizes for $2d$ ~2d~ is close to $C_1 S_d(1)^2$ ~C_1 S_d(1)^2~ but we overcount. How much do we overcount by? Notice that we are overcounting the terms that are being multiplied by themselves so we should subtract $C_1-1$ ~C_1-1~ times this sum. Suppose we call the sum of fourth powers of the classes of $d$ ~d~, $S_d(2)$ ~S_d(2)~. Then we have $S_{2d}(1) = C_1 S_d(1)^2-(C_1-1)S_d(2)$ . The interesting thing is that this works exactly the same for higher powers so we have the general formula $S_{2d}(i) = C_i S_d(i)^2-(C_i-1)S_d(i+1)$ . The coefficients $C_i$ ~C_i~ can be found in the same way as approach 1 in subtask 4.

Therefore, our algorithm becomes as follows: suppose $a_i = 2^m k$ ~a_i = 2^m k~. We start the procedure at $k$ ~k~ where we know that $S_k(i) = 2^k$ ~S_k(i) = 2^k~ since the sum of arbitrary powers of $2^k$ ~2^k~ groups of $1$ ~1~s is always $2^k$ ~2^k~. We then repeat this procedure to generate the results for $2k, 2^2 k, \dots, 2^m k$ ~2k, 2^2 k, \dots, 2^m k~. Note that at iteration $i$ ~i~, we cannot generate $S_{2^ik}(m-i+1)$ ~S_{2^ik}(m-i+1)~ since we don't have $S_{2^ik}(m-i+2)$ ~S_{2^ik}(m-i+2)~. But, if we start by generating $S_k(i)$ ~S_k(i)~ for $i \le m$ ~i \le m~ then we will have enough terms to generate $S_{a_i}(1)$ ~S_{a_i}(1)~ which is the answer.

Time complexity: $\mathcal{O}(N \log^2(a_i))$ ~\mathcal{O}(N \log^2(a_i))~

Comments

There are no comments at the moment.