One of the neat things that makes Ed25519 such a fast signature scheme is a clever algorithm for point decompression.
You’re given the point’s $y$-coordinate, and have to calculate $x = \pm\sqrt{(y^2 - 1)/(d y^2 + 1)}$. The naïve approach is to observe that inversion takes a modexp, and square root takes a modexp, so we need two modexps. The Ed25519 authors noticed that you can do both jobs with one modexp. There are two problems which need cleverness: one is how to combine the division and square root calculations; and the other is how to actually calculate a square root in $\mathbb{F}_{2^{255}-19}$ in the first place.
The latter problem is that $p = 2^{255} - 19 \equiv 5 \pmod{8}$, so you can’t calculate square roots directly. Their clever solution, which I can’t improve on, is as follows. Suppose we’re given some $v$, which we know (or are willing to assume) is a square. Therefore, $v$ has order $(p - 1)/2$. The trick is to notice that, if we let $w = v^{(p+3)/8}$, then $w^4 = v^{(p+3)/2} = v^{(p-1)/2} v^2 = v^2$. Therefore $w^2 = \pm v$. Conveniently $-1$ is a square in this field, so we can use $w$ as our square root of $v$ if $w^2 = v$, or pick $i w$ if $w^2 = -v$ (where $i = \sqrt{-1}$).
The paper suggests that, given the problem of calculating a square root of $x/y$, to compute $w = x y^3 (x y^7)^{(p-5)/8}$. This I can make better.
Instead, I suggest calculating $v = x y$, and $w = x v^{3(p-5)/8+1}$. Then $w^4 = x^4 v^{3(p-5)/2+4} = x^4 v^{(3p-7)/2} = x^4 v^{3(p-1)/2-2} = x^4 v^{-2} (v^{(p-1)/2})^3 = x^4/v^2 = x^4/(x^2 y^2) = x^2/y^2$. Phew!
Of course, I didn’t figure it out in that direction. Indeed, I started out with a different problem, in a different field. Specifically, I wanted to decompress a point on the Ed448-Goldilocks curve, over the field $\mathbb{F}_{\phi^2-\phi-1}$, where $\phi = 2^{224}$. Helpfully, $q = \phi^2 - \phi - 1 \equiv 3 \pmod{4}$ because everything about this prime is awesome, so you can compute square roots in $\mathbb{F}_q$ as $\sqrt{v} = w = v^{(q+1)/4}$, since $w^2 = v^{(q+1)/2} = v^{(q-1)/2} v = v$. Which leaves the problem of combining this with division.
I found myself reasoning by analogy with Montgomery’s inversion trick. (If you want to compute both $x^{-1}$ and $y^{-1}$, then let $z = x y$, and compute $x^{-1} = y z^{-1}$ and similarly for $y$.) If I can find a square root $w’$ of $v = 1/x y$, then $w = x w’$ will be a square root of $x/y$. This replaces the problem of division with the problem of inversion. At this point, it’s sort of easy. I’ll go back to the Ed25519 field, and leave the Goldilocks version as an exercise. (Or you can cheat and peek at the Catacomb source code. RFC8032 is no help: it gives an unnecessarily complicated version, similar to the Ed25519 formula above.)
I’ll assume $v$ has order $(p - 1)/2$, so to invert I need to raise to the power $(p - 3)/2$ to find $1/v$. Then I raise to the power $(p + 3)/8$ to find the maybe-square-root of $1/v$ which might be off by a factor of $i$. But if I do these one after the other, then I can just multiply them modulo $(p - 1)/2$. That’s going to be a nuisance because of the divisions. But wait! These are all whole numbers. How do I know this? Because $p \equiv 5 \pmod{8}$. So let’s actually write $p = 8 t + 5$. Then the group of quadratic residues has order $4 t + 2$. To invert, I raise to the power $4 t + 1$; and to find the maybe-square-root, I raise to the power $t + 1$. Altogether, then, I want $m = (4 t + 1) (t + 1) = 4 t^2 + 5 t + 1 = t (4 t + 2) + 3 t + 1 \equiv 3 t + 1 = 3 (p - 5)/8 + 1 \pmod{4 t + 2}$. And I’m done.
[Edited 2017-09-20: Fix a typo in the description of Montgomery’s inversion trick.]