Editorial for WC '15 Contest 2 S2 - Attack of the Clones

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

This problem proved to be surprisingly difficult, since it had fewer full solutions in the live contest than the third senior problem. Given $N$ ~N~ points on a grid where odd-numbered streets can only move right/down and even-numbered streets can only move left/down, we would like to calculate the minimum distance to visit the first $x$ ~x~ points starting from $(1, 1)$ ~(1, 1)~ for all $x$ ~x~ between $1$ ~1~ and $N$ ~N~.

A naïve solution would be to position each coordinate in a 2D array as they're being inputted, and then for each $x$ ~x~, looping through each row and column of the grid to count the steps between each location. For odd-numbered rows, find the east-most avenue which has a location either in the current street, or the street below it (so we don't miss that location when we go south) and go south from there. For even-numbered rows, it's the same except we search for the west-most avenue. Since each time requires looping through the max size of the grid and we do this $N$ ~N~ times, the time complexity will be $\mathcal O(N \times \max(S_i) \times \max(A_i))$ . In the subtask where $N$ ~N~, $S_i$ ~S_i~, and $A_i$ ~A_i~ are all at most $100$ ~100~, this will take roughly $100^3 = 1$ ~100^3 = 1~ million steps and pass in time. This solution is given $30\%$ ~30\%~ of points.

Let's consider another naïve solution based on an important observation. If we want to minimize the total distance travelled, any given set of $x$ ~x~ points can only be visited in one possible order following the zig-zagging streets and avenues downwards and sideways. How do we determine this ordering? We know that north-most points are always visited before south-most points, meanwhile for points on the same row, west-most points are always visited before east-most points if and only if the row number is odd. So, if we were to sort the points using a comparison-based sort, we could use the following comparison function to determine whether the point is less.

bool is_less(pair<int, int> a, pair<int, int> b) {
  if (a.first != b.first)
    return a.first < b.first;
  if (a.first % 2 == 1)
    return a.second < b.second;
  return a.second > b.second;
}

In C++, the std::sort() function in <algorithm> allows us to specify a custom-defined comparator. Many other languages also support special comparators for sorting functions. To sort an std::vector of $N$ ~N~ points (stored as std::pairs), we would simply have to call sort(points.begin(), points.end(), is_less).

After the points are sorted, we'll need to find the Manhattan distance between each pair of adjacent points and add them all up. The Manhattan distance between two points $(s_1, a_1)$ ~(s_1, a_1)~ and $(s_2, a_2)$ ~(s_2, a_2)~ is $|s_1-s_2|+|a_1-a_2|$ ~|s_1-s_2|+|a_1-a_2|~, and obviously represents the minimum distance between any two adjacent points in this grid. This works because, evidently, the overall distance will be minimal if the distance between each adjacent pair of locations in the order are minimized.

We also propose a trick to eliminate the need for comparators altogether. Simply negate the value of $a$ ~a~ whenever the value of $s$ ~s~ is even, and the implicit comparator for pairs will automatically take care of it during sorting (comparing by $s$ ~s~ if they are different, breaking ties by comparing $a$ ~a~). This will work in languages like Python, where tuples are compared lexicographically by default. The only time we'll be using $a$ ~a~ values again is when we're calculating the Manhattan distance. There, we can just use the formula $|s_1-s_2|+||a_1|-|a_2||$ ~|s_1-s_2|+||a_1|-|a_2||~, since $|x-y| = ||x|-|y||$ ~|x-y| = ||x|-|y||~.

Efficient sorting functions have a worst-case time complexity of $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~, and finding the Manhattan distance requires looping through all of the $N$ ~N~ points, and will require $\mathcal O(N)$ ~\mathcal O(N)~ time. Therefore, finding the total distance for just one set of $N$ ~N~ points takes $\mathcal O(N \log N + N)$ ~\mathcal O(N \log N + N)~, dominated by $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~ time. The problem is that we have to repeat this for each $x$ ~x~ in the range $1 \dots N$ ~1 \dots N~, re-sorting the array each time. So, the running time is effectively $\mathcal O(N^2 \log N)$ ~\mathcal O(N^2 \log N)~. Since $N \le 200\,000$ ~N \le 200\,000~, this is far too slow. However, for $N \le 2000$ ~N \le 2000~, there will be roughly $2000^2 \log_2 N \approx 44$ ~2000^2 \log_2 N \approx 44~ million steps and should pass in time. Such a solution is given $75\%$ ~75\%~ of points.

Now, we can consider some optimizations to this solution. The bottleneck right now is the $N \log N$ ~N \log N~ sorting function. We can actually reduce sorting to just $\mathcal O(N)$ ~\mathcal O(N)~ time by using sorting algorithms such as insertion sort that work really well on nearly-sorted lists. Alternatively, can store the points in a linked list where we can iterate through to insert each new point (up to $N$ ~N~ iterations to find the position we want, and $\mathcal O(1)$ ~\mathcal O(1)~ time to insert into the linked list) also in $\mathcal O(N)$ ~\mathcal O(N)~ time. Ultimately though, both of these methods still require $\mathcal O(N^2)$ ~\mathcal O(N^2)~ total running time, and for $N \le 200\,000$ ~N \le 200\,000~, this is just not feasible.

How do we insert faster than $\mathcal O(N)$ ~\mathcal O(N)~ into an ordered set of values? Why, using a balanced binary search tree of course! Many programming languages have this data structure built-in – C++ has std::set, and Java has java.util.TreeSet. We can insert any element into it in worst-case $\mathcal O(\log N)$ ~\mathcal O(\log N)~ time, where $N$ ~N~ is the number of elements currently in the set. The code itself won't get much more complex from this change. Now, the bottleneck is recomputing all of the Manhattan distances, which would still take $\mathcal O(N)$ ~\mathcal O(N)~ and make the entire algorithm $\mathcal O(N^2)$ ~\mathcal O(N^2)~.

Notice that whenever a single point is inserted, most of the adjacent Manhattan distances won't change. Namely, only the ongoing sum should be maintained, and when a new point $p$ ~p~ is inserted between two points $a$ ~a~ and $b$ ~b~ in the list, we can just subtract the Manhattan distance from $a$ ~a~ to $b$ ~b~, and add the distances from $a$ ~a~ to $p$ ~p~ and from $p$ ~p~ to $b$ ~b~ to our running total. Since for each of the $N$ ~N~ points, addition and subtraction takes $\mathcal O(1)$ ~\mathcal O(1)~ while insertion takes $\mathcal O(\log N)$ ~\mathcal O(\log N)~, this approach has a total time complexity of $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~.

Note that we'll need to obtain iterators to before and after the current element that was just inserted, and this requires some variety of a binary search operation on the tree that runs in $\mathcal O(\log N)$ ~\mathcal O(\log N)~. In C++, this can be done with set.insert(), set.lower_bound(), or set.upper_bound(). In Java, there are the functions TreeSet.lower() and TreeSet.higher() which can be used to get the elements just before and just after.

Comments

There are no comments at the moment.