Editorial for CCC '18 S4 - Balanced Trees
Submitting an official solution before solving the problem yourself is a bannable offence.
Author:
The first phase of solving this problem involves converting the problem statement into an equation. Let be the number of perfectly balanced trees with weight . The base case is . For , the number of subtrees is between and . If there are subtrees, then the weight of each subtree is . Since the subtrees must be completely identical, the number of possibilities is . Thus, we have the recurrence
A direct DP implementation of this will take time, and pass the first subtask.
One key fact to improving this time complexity is that there are distinct integers in the form for integer between and . This is because for . This idea is related to the following two observations.
First, the number of intermediate values required to compute is significantly smaller than . Because , we will only need to know values of where and is an integer between and , to find using the recursive formula.
Suppose we use this idea to only compute for the required values. Assuming we already know for all necessary , computing takes time. Thus, the time complexity for computing using this method is bounded asymptotically by
This is fast enough to pass the first 3 subtasks. This algorithm is still easy to implement; one only needs a recursive function with memoization. Because of the very structured form of the necessary states, the memoization can be done simply with two arrays of size (i.e. there is no need for map data structures). To elaborate more on how this form of memoization works, note that there are really two types of numbers of the form , ones greater than and ones less than or equal to it. For the latter type, indexing can be done as usual, on memory of size . For the former type, the indices are of the form with . In this case, we can represent the more densely using the index , where for , it can be shown that the indexing is unique.
Although this form of memoization is probably the most sophisticated, it is hardly necessary to solve the problem fully. For all but perhaps the last subtask, it suffices to use an array of elements. To optimize this memory consumption, one can set a size limit to some large constant (as high as possible without running out of memory), memoize the states which fit into this array, and just not memoize the other states (or alternatively, use a slower method of memoization for these few particular states). Finally, for languages with fast hash maps such as C++, using a hash map is an option that does pass the time limit, despite being noticeably slower than other approaches.
If memoized recursion takes time , how fast is unmemoized recursion? Intuitively, this time is roughly proportional to the value of itself. We can show that for a particular value of by induction. The key inductive step would be to show
This inequality is satisfied if , or equivalently, if (). This means that the has an upper bound of . Empirically, the exponent appears to be slightly lower (at least for small ), due to the error introduced by eliminating the floors and making the sum infinite in the argument above. Thus, perhaps surprisingly, naive recursion will pass the first two subtasks. Also, this indicates that the value of will fit in a 64-bit integer for all values of given in the problem. Fun fact: if someone solving the problem were to assume that the output always fits in a 64-bit integer, then they could theoretically reverse-engineer that fact to discover that the complexity of naive recursion isn't that bad after all.
The second main observation to solving the problem fully is that the recursive formula for can be optimized to make recursive calls. Furthermore, can be computed in time if we put aside the cost to compute for smaller numbers. This follows from the fact that there are distinct integers of the form . The only catch is that to compute efficiently, one must know the number of values of for which a particular integer equals . After some thinking, one can discover that this quantity has the following closed form:
Thus, we have
If we use this observation on its own (that is, without any of the previous optimizations on the number of states), then the time complexity is
This solution passes the first two subtasks.
Combined with the previous observation about the set of required states, however, the time complexity is
Thus, we can compute in time, which is good enough to pass the final subtask.
In summary, the subtasks admit the following solutions:
- Naive DP ()
- Naive recursion (), Observation 2 ()
- Observation 1 or memoized recursion ()
- Observations 1 and 2 combined ()
Note that analysis of these time complexities is an optional part of solving the problem. The simple input format makes it easy for contestants to test the performance of their programs to estimate if they are on the right track.
Comments
A little bit easier to understand version:
For a root node with weight , it can have subtrees:
That is:
There are repeated computations in the above recursion. e.g.
We only need to compute
To compute the count of each :
We can also record previously calculated numbers with a hashtable, so that we don't calculate them again. For example, when we compute , the recursion will eventually calculate . We can store during the calculation of . Then later when we calculate in the 's loop (), we can use previously computed directly.