## Editorial for DMOPC '20 Contest 3 P5 - Tower of Power

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.

Submitting an official solution before solving the problem yourself is a bannable offence.

Author: nthistle

Since is at most 3, and is prime, we can just directly use Euler's Theorem along with a standard modpow. That is, given a tower , we can first compute , and then compute . The important thing here is that all exponentiations are done with a modulus, since otherwise, even for small values of , the numbers rapidly blow up in size. As is prime, we can easily compute , and the modular exponentiations are each time (where is the exponent, and bounded by ). In Python, this approach might look something like the following:

n, m = map(int, input().split())
a1, a2, a3 = map(int, input().split())
print(pow(a1, pow(a2, a3, m - 1), m))

Of course, an actual solution needs to account for both , , and the case where (Python will happily evaluate , although here it should be – one of the last test cases for subtask 1 covered this, and a few people got tripped up by it).

Time complexity: (but only works for )

The initial naive approach here is to use the same process as from subtask 1, but simply pass the modulus up the tower by repeatedly taking , , , and then evaluating from the top down. This approach requires a more general method for computing , as the modulus is no longer guaranteed to be prime, and certainly can never be prime, but it also doesn't work because of the requirements we need to use Euler's Theorem: we need the base and the modulus to be coprime. In subtask 1, as the modulus is prime, the only time this isn't the case is when the base is a multiple of the modulus, and the result is trivially 0. However, now we have to deal with the more general case where and share some, not necessarily all, prime factors.

Computing can be done simply by factorizing , which can be done easily in time, and computing , and repeatedly applying makes the modulus shrink relatively quickly (it isn't hard to see that after iterations, the result will be 1), so this isn't an issue (Also note that as we apply , the cost of factorizing goes down substantially, so this bound is not tight).

The intended solution to the non-coprime case was to use Chinese Remainder Theorem, splitting up into prime powers. More formally, let , where are primes that also divide , and is whatever is left over of after dividing out these prime powers. Then, we compute , , , etc., and then "reassemble" the results using CRT:

Computing the first of these is easy, as is by definition now coprime to , so we just continue with the regular approach. For the others, we claim they are usually , and when they aren't, they can be directly evaluated. Consider just one prime that divides both and , and let , and be the highest power of that divides . Also, allow to represent the value of the "rest" of the tower (not modulo anything), which we can't necessarily compute directly. Then, we have

Clearly, if , this whole term is just 0, as the result will be a multiple of . If it isn't, we must compute directly, but then this result is manageable, as , which is at most 30. Thus, we can just check the next 4 terms of the tower (), and see whether the result will be at least or not, which gives us time for computing the results from the prime powers.

The actual process of CRT itself is quite fast, basically only relying on the extended Euclidean algorithm, which is time (depending on how many distinct primes we have in common between the base and the modulus, we might have to do this a few times, but the constant here is small: at most 9 for bounds of ). Naively, this gives us a total runtime of , but we can use the fact that we observed earlier, which is that applying decreases the modulus very rapidly, and we can limit the we need to consider to .

Time complexity:

However, as it turns out, there's a much cleaner solution to write, which doesn't require CRT (and is slightly faster). In the cleaner version, you just ignore the non-coprime issue, but instead when you propagate the modulus up the tower with , you don't reduce it all the way modulo . If the exponent is , when , we leave it alone, but if , we reduce it such that we have , . This abuses the following result:

This result can be proved with the Chinese Remainder Theorem: consider splitting out into prime powers. By CRT, if the statement is true modulo all the prime powers of , it's true modulo . Modulo any prime power coprime to , Euler's Theorem tells us it's true. Modulo any prime power not coprime to , they must be congruent to 0, since is at least the largest exponent of any prime power that divides . Writing this more generally,

As long as we don't remove that last from the exponent, we're fine. One way to look at this is we get to move the Chinese Remainder Theorem from the computations into a proof of an intermediary result, at the cost of our exponents being larger (which costs very little as modular exponentiation is cheap).

Time complexity: