Editorial for COCI '08 Contest 3 #5 BST

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The problem asks us to calculate the depth of each new node and keep track of the sum of depths.

To start with, note that we cannot just directly simulate the algorithm in the problem statement. If the tree is not balanced (and the authors of the test data will ensure that it is not), its depth will grow to \mathcal O(N) and the overall complexity will be \mathcal O(N^2), way too much with N up to 300\,000.

Because the tree takes \mathcal O(1) memory per node, we can afford to construct it explicitly (so that we know for each node its left child, right child, parent and depth), as long as the construction is efficient – for example, \mathcal O(\log N) per number or faster. As it turns out, in this problem it suffices to keep just the depths.

Let X be the number we are inserting, L be the largest already inserted number smaller than X, and R be the smallest already inserted number larger than X.

We can show by contradiction that L will necessarily be an ancestor of R or vice versa. Also, because of the binary search tree property, after insertion X will become the left child of R or the right child of L, depending on which of the two is deeper in the tree.

So if we could efficiently determine the number L and R for every X, it would be easy to maintain the entire tree and calculate the numbers sought in the problem.

Assuming they made these observations, contestants using C++ had it easy because the built-in data structures set and map (which are themselves implemented as binary search trees behind the scenes, but balanced) support these operations – efficiently finding the element just above and just below a given element. The overall complexity of such a solution is \mathcal O(N \log N).

In Pascal or C, such a data structure is not available so we need to keep thinking.

Suppose we build a doubly linked list and initialize it to contain the numbers 1 through N in sorted order. If we remove the numbers in the input in reverse order, the list will be able to give us exactly what we want. At every point, the list will contain exactly the numbers that would have been inserted up until then, and it's easy to retrieve L and R, given X; these are the predecessor and successor of X in the list. The overall complexity of this solution is \mathcal O(N).

Another intriguing approach is to keep in an array of length N, for each number that has been inserted, its predecessor and successor. When inserting a new number X, we do a bidirectional (breadth-first) search around X to find a number that has already been inserted. If we first find L, its successor is the number R. If we first find R, its predecessor is the number L. It is not obvious, but because this array gets dense fast, the overall complexity of inserting all numbers is \mathcal O(N \log N), even though individual insertions can take a number of steps linear in N.


There are no comments at the moment.