One of the neat things that makes Ed25519
such a fast signature scheme is a clever algorithm for point
decompression.

You’re given the point’s $y$-coordinate, and have to calculate $x =
\pm\sqrt{(y^2 - 1)/(d y^2 + 1)}$. The naïve approach is to observe that
inversion takes a modexp, and square root takes a modexp, so we need two
modexps. The Ed25519 authors noticed that you can do both jobs with one
modexp. There are two problems which need cleverness: one is how to
combine the division and square root calculations; and the other is how
to actually calculate a square root in $\mathbb{F}_{2^{255}-19}$ in the
first place.

The latter problem is that $p = 2^{255} - 19 \equiv 5 \pmod{8}$, so you
can’t calculate square roots directly. Their clever solution, which I
can’t improve on, is as follows. Suppose we’re given some $v$, which we
know (or are willing to assume) is a square. Therefore, $v$ has order
$(p - 1)/2$. The trick is to notice that, if we let $w = v^{(p+3)/8}$,
then $w^4 = v^{(p+3)/2} = v^{(p-1)/2} v^2 = v^2$. Therefore
$w^2 = \pm v$. Conveniently $-1$ is a square in this field, so we
can use $w$ as our square root of $v$ if $w^2 = v$, or pick $i w$ if
$w^2 = -v$ (where $i = \sqrt{-1}$).

The paper suggests that, given the problem of calculating a square root
of $x/y$, to compute $w = x y^3 (x y^7)^{(p-5)/8}$. This I can make
better.

Instead, I suggest calculating $v = x y$, and $w = x v^{3(p-5)/8+1}$.
Then $w^4 = x^4 v^{3(p-5)/2+4} = x^4 v^{(3p-7)/2} = x^4 v^{3(p-1)/2-2} =
x^4 v^{-2} (v^{(p-1)/2})^3 = x^4/v^2 = x^4/(x^2 y^2) = x^2/y^2$. Phew!

Of course, I didn’t figure it out in that direction. Indeed, I started
out with a different problem, in a different field. Specifically, I
wanted to decompress a point on the Ed448-Goldilocks curve, over the
field $\mathbb{F}_{\phi^2-\phi-1}$, where $\phi = 2^{224}$. Helpfully,
$q = \phi^2 - \phi - 1 \equiv 3 \pmod{4}$ because everything about this
prime is *awesome*, so you can compute square roots in $\mathbb{F}_q$ as
$\sqrt{v} = w = v^{(q+1)/4}$, since $w^2 = v^{(q+1)/2} = v^{(q-1)/2} v =
v$. Which leaves the problem of combining this with division.

I found myself reasoning by analogy with Montgomery’s inversion trick.
(If you want to compute both $x^{-1}$ and $y^{-1}$, then let $z = x y$,
and compute $x^{-1} = y z^{-1}$ and similarly for $y$.) If I can find a
square root $w’$ of $v = 1/x y$, then $w = x w’$ will be a square root
of $x/y$. This replaces the problem of *division* with the problem of
*inversion*. At this point, it’s sort of easy. I’ll go back to the
Ed25519 field, and leave the Goldilocks version as an exercise. (Or you
can cheat and peek at the Catacomb source code. RFC8032 is no help: it
gives an unnecessarily complicated version, similar to the Ed25519
formula above.)

I’ll assume $v$ has order $(p - 1)/2$, so to invert I need to raise to
the power $(p - 3)/2$ to find $1/v$. Then I raise to the power
$(p + 3)/8$ to find the maybe-square-root of $1/v$ which might be off by
a factor of $i$. But if I do these one after the other, then I can
just multiply them modulo $(p - 1)/2$. That’s going to be a nuisance
because of the divisions. But wait! These are all whole numbers. How
do I know this? Because $p \equiv 5 \pmod{8}$. So let’s actually write
$p = 8 t + 5$. Then the group of quadratic residues has order $4 t + 2$.
To invert, I raise to the power $4 t + 1$; and to find the
maybe-square-root, I raise to the power $t + 1$. Altogether, then, I
want $m = (4 t + 1) (t + 1) = 4 t^2 + 5 t + 1 = t (4 t + 2) + 3 t + 1
\equiv 3 t + 1 = 3 (p - 5)/8 + 1 \pmod{4 t + 2}$. And I’m done.

[**Edited 2017-09-20:** Fix a typo in the description of Montgomery’s
inversion trick.]