Editorial for DMOPC '20 Contest 3 P5 - Tower of Power

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Author: nthistle

Subtask 1

Since $N$ ~N~ is at most 3, and $M$ ~M~ is prime, we can just directly use Euler's Theorem along with a standard modpow. That is, given a tower $a_1^{\left(a_2^{a_3}\right)} \pmod{M}$ ~a_1^{\left(a_2^{a_3}\right)} \pmod{M}~, we can first compute $b = a_2^{a_3} \pmod{\phi(M)}$ ~b = a_2^{a_3} \pmod{\phi(M)}~, and then compute $a_1^{\left(a_2^{a_3}\right)} \equiv a_1^b \pmod{M}$ . The important thing here is that all exponentiations are done with a modulus, since otherwise, even for small values of $a_i$ ~a_i~, the numbers rapidly blow up in size. As $M$ ~M~ is prime, we can easily compute $\phi(M) = M-1$ ~\phi(M) = M-1~, and the modular exponentiations are each $\mathcal{O}(\log A)$ ~\mathcal{O}(\log A)~ time (where $A$ ~A~ is the exponent, and bounded by $10^9$ ~10^9~). In Python, this approach might look something like the following:

n, m = map(int, input().split())
a1, a2, a3 = map(int, input().split())
print(pow(a1, pow(a2, a3, m - 1), m))

Of course, an actual solution needs to account for both $n = 1$ ~n = 1~, $n = 2$ ~n = 2~, and the case where $a_1 \equiv 0 \pmod{M}$ ~a_1 \equiv 0 \pmod{M}~ (Python will happily evaluate $0^0 = 1$ ~0^0 = 1~, although here it should be $0$ ~0~ – one of the last test cases for subtask 1 covered this, and a few people got tripped up by it).

Time complexity: $\mathcal{O}(\log A)$ ~\mathcal{O}(\log A)~ (but only works for $N \le 3$ ~N \le 3~)

Subtask 2

The initial naive approach here is to use the same process as from subtask 1, but simply pass the modulus up the tower by repeatedly taking $\phi(M), \phi(\phi(M)), \dots$ ~\phi(M), \phi(\phi(M)), \dots~, and then evaluating from the top down. This approach requires a more general method for computing $\phi(\cdot)$ ~\phi(\cdot)~, as the modulus is no longer guaranteed to be prime, and certainly $\phi(M)$ ~\phi(M)~ can never be prime, but it also doesn't work because of the requirements we need to use Euler's Theorem: we need the base and the modulus to be coprime. In subtask 1, as the modulus is prime, the only time this isn't the case is when the base is a multiple of the modulus, and the result is trivially 0. However, now we have to deal with the more general case where $a_i$ ~a_i~ and $\phi^{i-1}(M)$ ~\phi^{i-1}(M)~ share some, not necessarily all, prime factors.

Computing $\phi(M)$ ~\phi(M)~ can be done simply by factorizing $M = p_1^{k_1} p_2^{k_2} \dots$ ~M = p_1^{k_1} p_2^{k_2} \dots~, which can be done easily in $\mathcal{O}(\sqrt M)$ ~\mathcal{O}(\sqrt M)~ time, and computing $\phi(M) = (p_1^{k_1} - p_1^{k_1 - 1}) (p_2^{k_2} - p_2^{k_2 - 1}) \dots$ , and repeatedly applying $\phi(\cdot)$ ~\phi(\cdot)~ makes the modulus shrink relatively quickly (it isn't hard to see that after $\log_2 2M$ ~\log_2 2M~ iterations, the result will be 1), so this isn't an issue (Also note that as we apply $\phi(\cdot)$ ~\phi(\cdot)~, the cost of factorizing goes down substantially, so this bound is not tight).

The intended solution to the non-coprime case was to use Chinese Remainder Theorem, splitting up $a_i^{(\dots)} \pmod{M}$ ~a_i^{(\dots)} \pmod{M}~ into prime powers. More formally, let $M = r \cdot p_1^{k_1} \cdot p_2^{k_2}$ ~M = r \cdot p_1^{k_1} \cdot p_2^{k_2}~, where $p_1, p_2, \dots$ ~p_1, p_2, \dots~ are primes that also divide $a_i$ ~a_i~, and $r$ ~r~ is whatever is left over of $M$ ~M~ after dividing out these prime powers. Then, we compute $a_i^{(\dots)} \pmod{r}$ ~a_i^{(\dots)} \pmod{r}~, $a_i^{(\dots)} \pmod{p_1^{k_1}}$ ~a_i^{(\dots)} \pmod{p_1^{k_1}}~, $a_i^{(\dots)} \pmod{p_2^{k_2}}$ ~a_i^{(\dots)} \pmod{p_2^{k_2}}~, etc., and then "reassemble" the results using CRT:

$\displaystyle \left. \begin{array}{r} a_i^{(\dots)} \pmod{r} \\ a_i^{(\dots)} \pmod{p_1^{k_1}} \\ a_i^{(\dots)} \pmod{p_2^{k_2}} \\ \dots \end{array} \right\} \overset{CRT}{\implies} a_i^{(\dots)} \pmod{M}$

Computing the first of these is easy, as $r$ ~r~ is by definition now coprime to $a_i$ ~a_i~, so we just continue with the regular approach. For the others, we claim they are usually $0$ ~0~, and when they aren't, they can be directly evaluated. Consider just one prime $p$ ~p~ that divides both $M$ ~M~ and $a_i$ ~a_i~, and let $a_i = t \cdot p^b$ ~a_i = t \cdot p^b~, and $p^c$ ~p^c~ be the highest power of $p$ ~p~ that divides $M$ ~M~. Also, allow $k$ ~k~ to represent the value of the "rest" of the tower (not modulo anything), which we can't necessarily compute directly. Then, we have:

$\displaystyle a_i^k \equiv (t \cdot p^b)^k \pmod{p^c}$

Clearly, if $bk \ge c$ ~bk \ge c~, this whole term is just 0, as the result will be a multiple of $p^c$ ~p^c~. If it isn't, we must compute directly, but then this result is manageable, as $k < \frac{c}{b}$ ~k < \frac{c}{b}~, which is at most 30. Thus, we can just check the next 4 terms of the tower ( $2^{2^{2^2}} > 30$ ~2^{2^{2^2}} > 30~), and see whether the result will be at least $\frac{c}{b}$ ~\frac{c}{b}~ or not, which gives us $\mathcal{O}(1)$ ~\mathcal{O}(1)~ time for computing the results from the prime powers.

The actual process of CRT itself is quite fast, basically only relying on the extended Euclidean algorithm, which is $\mathcal{O}(\log M)$ ~\mathcal{O}(\log M)~ time (depending on how many distinct primes we have in common between the base and the modulus, we might have to do this a few times, but the constant here is small: at most 9 for bounds of $10^9$ ~10^9~). Naively, this gives us a total runtime of $\mathcal{O}(N \sqrt M (\log M)(\log A))$ , but we can use the fact that we observed earlier, which is that applying $\phi(\cdot)$ ~\phi(\cdot)~ decreases the modulus very rapidly, and we can limit the $N$ ~N~ we need to consider to $\log 2M$ ~\log 2M~.

Time complexity: $\mathcal{O}(\sqrt M (\log M)^2 (\log A))$

However, as it turns out, there's a much cleaner solution to write, which doesn't require CRT (and is slightly faster). In the cleaner version, you just ignore the non-coprime issue, but instead when you propagate the modulus up the tower with $\phi(\cdot)$ ~\phi(\cdot)~, you don't reduce it all the way modulo $\phi(M)$ ~\phi(M)~. If the exponent is $b$ ~b~, when $1 < b < \phi(M)$ ~1 < b < \phi(M)~, we leave it alone, but if $b \ge \phi(M)$ ~b \ge \phi(M)~, we reduce it such that we have $b \equiv b' \pmod{\phi(M)}$ ~b \equiv b' \pmod{\phi(M)}~, $\phi(M) \le b < 2\phi(M)$ ~\phi(M) \le b < 2\phi(M)~. This abuses the following result:

$\displaystyle a^{2\phi(M)} \equiv a^{\phi(M)} \pmod{M}, \textit{ regardless }\text{ of whether }a, M\text{ coprime.}$

This result can be proved with the Chinese Remainder Theorem: consider splitting $M$ ~M~ out into prime powers. By CRT, if the statement is true modulo all the prime powers of $M$ ~M~, it's true modulo $M$ ~M~. Modulo any prime power coprime to $a$ ~a~, Euler's Theorem tells us it's true. Modulo any prime power not coprime to $a$ ~a~, they must be congruent to 0, since $\phi(M)$ ~\phi(M)~ is at least the largest exponent of any prime power that divides $M$ ~M~. Writing this more generally,

$\displaystyle a^{k\phi(M)+c} \equiv a^{\phi(M)+c} \pmod{M}$

As long as we don't remove that last $\phi(M)$ ~\phi(M)~ from the exponent, we're fine. One way to look at this is we get to move the Chinese Remainder Theorem from the computations into a proof of an intermediary result, at the cost of our exponents being $\phi(M)$ ~\phi(M)~ larger (which costs very little as modular exponentiation is cheap).

Time complexity: $\mathcal{O}(\sqrt M (\log M) (\log A))$

Comments

There are no comments at the moment.