A review of I'm Back 35

| No Comments

tl;dr: I contributed to a crowdfunded project to make a gadget for capturing digital images using an old-fashioned 35 mm film camera. The thing finally arrived and I tried it. I’m disappointed.

The long version is, well, long, so cut.

On copying DVDs

| No Comments

I have an extensive collection of DVDs. They take up three bookcases, which is a bit annoying, because I have books which I’d like to put on them instead. So I’ve been copying the DVDs onto computer storage. This has taught me surprising things.

Finally!

| No Comments

I tend to follow a standard idiom when writing C functions which allocate resources. At the top, I define a variable for each resource, and initialize it to a sentinel value meaning that it’s not yet active, and another for the function’s return value. At the bottom, I define a label end, where I release all of the resources which still seem live, and return the right result. If something goes wrong partway through, I set the return value to indicate the problem, and goto end. Easy.

But it means that the code to release a thing isn’t near where it’s defined or allocated.

I had a brainwave the other day, and wrote a macro.

#define FINALLY(body) \
  __extension__ __inline__ TMP(finally_fn) \
    (const int __attribute__((unused)) *hunoz) { body } \
  int __attribute__((unused, cleanup(TMP(finally_fn)))) \
    TMP(finally_var)

The TMP macro decorates an identifier so that it’s (more or less) private to the macro invocation. Now I can say something like

void *p = 0; FINALLY({ free(p); });

and I can use return as usual for an early exit. Yay. (Apparently I’m not the only person who’s thought of this.)

This macro has two problems.

  • Firstly, it uses nested functions, so it only works with GCC. Clang doesn’t support these, presumably because there’s no good way to implement a pointer to a nested function.

  • Secondly, the syntax is ugly, involving parentheses and a final semicolon.

I can’t figure out a way to solve both problems at the same time.

I’ve just read yet another article showing how to calculate elements of the Fibonacci sequence in some functional language. I’ll give examples in Scala, though I’m not doing anything very clever linguistically, so they shouldn’t be hard for anyone. Of course, it showed the hopelessly inefficient dual recursion

def terrible_fib(n: BigInt): BigInt =
  if (n < 0) sys.error("not defined for -ve n")
  else if (n == 0) 0
  else if (n == 1) 1
  else terrible_fib(n - 1) + terrible_fib(n - 2)

and a tail-recursive version

def tail_fib0(n: BigInt, a: BigInt, b: BigInt): BigInt =
  if (n == 0) a
  else tail_fib0(n - 1, a + b, a)
def tail_fib(n: BigInt): BigInt =
  if (n < 0) sys.error("not defined for -ve n")
  else tail_fib0(n, 0, 1)

There was also an explicitly iterative version, which isn’t interestingly different from the tail-recursive version, so I shan’t bother showing it. Unfortunately, the tail-recursive version is still rather slow. On my (very old) laptop, it takes nearly a minute to calculate tail_fib(1000000). We can do much better.

Oh. Oh my.

I have a small music collection: about 8000 files, and growing steadily. I want to keep a copy of my collection on my Android phone. (I’ve been keeping a full copy of my collection on some portable device for fifteen years now, and I don’t see why I should stop now.)

My collection is too large to fit into my phone’s `internal storage’, which appears to be whatever change is left out of 32GB by the phone’s firmware. So I bought a 64GB µSD card to keep the music on.

I initially made the copy by physically plugging my µSD card into my laptop’s MMC slot, mounting it, and running rsync(1). rsync is awesome.

Then I bought a new CD. (These are silvery discs which store uncompressed digital audio.) As usual, I copied the audio data off the CD, encoded it as FLAC, tagged it using MusicBrainz Picard, entered into my master archive, and then invoked my audio conversion gremlin to make Ogg Vorbis for satellite devices. Now all I need to do is copy the new Ogg Vorbis files to the phone.

And that’s where the trouble started.

Once upon a time, the cool kids used $w$-NAF for their elliptic curve scalar multiplication. Well, the cool kids who were implementing elliptic curve cryptography anyway.

The basic idea is simple enough. You want to multiply some given point $Q$ by a scalar $n$. So you pick a window width $w$, and split $n$ into a sum of signed $w$-bit pieces \[ n = \sum_i n_i 2^i \] with $-2^{w-1} \le n_i < 2^{w-1}$ for each $i$, and with at most one nonzero $n_i$ in any group of $w$. Then you walk through these, in descending order of $i$, multiplying an accumulator by appropriate small multiples of $Q$ plucked from a table and doubling.

Then we found out that actually this was a mostly terrible idea because it broadcasts $n$ through every side-channel available. The new cool idea is to convert $n$ to a signed-digit representation, so \[ n = \sum_i n_i 2^{wi} \] again, with $-2^{w-1} \le n_i < 2^{w-1}$. Now the digits are in predictable places, and all you need to avoid leakage is to do a constant-time table lookup.

This is annoying. And a waste of time and effort. There’s a unique representation of $n$ subject to these constraints. It’s possible to do the signed-digit recoding from right to left in constant time, propagating loans (like borrows in subtraction, but in the other direction) as you go. But you want to do the scalar multiplication from left to right so that you can use your table of small multiples. And, while you can do the signed-digit recoding left-to-right, you might need to look arbitrarily far ahead before you can commit to a digit. (If you see a long sequence of $2^{w-1} - 1$ digits, then you’ll have to inspect the digit after that to tell you whether they all lend one leftwards or not.) I suppose you could do this in constantly quadratic time, but I don’t think anyone would be impressed by that.

But you can have your cake and eat it too, as long as you give up on the unique representation. Relax the condition on $n_i$ to permit $2^{w-1} \le n_i \le 2^{w-1}$ and all is marvellous.

  • You can calculate an acceptable recoding in one pass, left to right, with one digit lookahead, and therefore in constant time.

  • You can do this in the same pass as the main scalar multiplication, with no additional storage.

  • You can work with the same size table of small multiples of $Q$. You already had to cope with the possibility of $n_i = -2^{w-1}$, and dealing with $n_i = \pm 2^{w-1}$ is no harder.

So, suppose you have a current digit $t$, and a lookahead digit $u$. (You start out with $t = 0$ and $u$ is the leftmost digit of $n$ in its standard radix-$2^w$ representation.) If $u \ge 2^{w-1}$, then set $t \gets t + 1$ and $u \gets u - 2^w$ (a loan leftwards); otherwise $u < 2^{2-1}$ and you do nothing. You can now commit to $t$: add $t Q$ to your accumulator or whatever. And finally, move along by setting $t \gets u$ and fetching the next input digit into $u$. At this point, you have $-2^{w-1} \le t < 2^{w-1}$, but you might end up adding one to it when you look at $u$; hence the bound above.

One of the neat things that makes Ed25519 such a fast signature scheme is a clever algorithm for point decompression.

You’re given the point’s $y$-coordinate, and have to calculate $x = \pm\sqrt{(y^2 - 1)/(d y^2 + 1)}$. The naïve approach is to observe that inversion takes a modexp, and square root takes a modexp, so we need two modexps. The Ed25519 authors noticed that you can do both jobs with one modexp. There are two problems which need cleverness: one is how to combine the division and square root calculations; and the other is how to actually calculate a square root in $\mathbb{F}_{2^{255}-19}$ in the first place.

The latter problem is that $p = 2^{255} - 19 \equiv 5 \pmod{8}$, so you can’t calculate square roots directly. Their clever solution, which I can’t improve on, is as follows. Suppose we’re given some $v$, which we know (or are willing to assume) is a square. Therefore, $v$ has order $(p - 1)/2$. The trick is to notice that, if we let $w = v^{(p+3)/8}$, then $w^4 = v^{(p+3)/2} = v^{(p-1)/2} v^2 = v^2$. Therefore $w^2 = \pm v$. Conveniently $-1$ is a square in this field, so we can use $w$ as our square root of $v$ if $w^2 = v$, or pick $i w$ if $w^2 = -v$ (where $i = \sqrt{-1}$).

The paper suggests that, given the problem of calculating a square root of $x/y$, to compute $w = x y^3 (x y^7)^{(p-5)/8}$. This I can make better.

Instead, I suggest calculating $v = x y$, and $w = x v^{3(p-5)/8+1}$. Then $w^4 = x^4 v^{3(p-5)/2+4} = x^4 v^{(3p-7)/2} = x^4 v^{3(p-1)/2-2} = x^4 v^{-2} (v^{(p-1)/2})^3 = x^4/v^2 = x^4/(x^2 y^2) = x^2/y^2$. Phew!

Of course, I didn’t figure it out in that direction. Indeed, I started out with a different problem, in a different field. Specifically, I wanted to decompress a point on the Ed448-Goldilocks curve, over the field $\mathbb{F}_{\phi^2-\phi-1}$, where $\phi = 2^{224}$. Helpfully, $q = \phi^2 - \phi - 1 \equiv 3 \pmod{4}$ because everything about this prime is awesome, so you can compute square roots in $\mathbb{F}_q$ as $\sqrt{v} = w = v^{(q+1)/4}$, since $w^2 = v^{(q+1)/2} = v^{(q-1)/2} v = v$. Which leaves the problem of combining this with division.

I found myself reasoning by analogy with Montgomery’s inversion trick. (If you want to compute both $x^{-1}$ and $y^{-1}$, then let $z = x y$, and compute $x^{-1} = y z^{-1}$ and similarly for $y$.) If I can find a square root $w’$ of $v = 1/x y$, then $w = x w’$ will be a square root of $x/y$. This replaces the problem of division with the problem of inversion. At this point, it’s sort of easy. I’ll go back to the Ed25519 field, and leave the Goldilocks version as an exercise. (Or you can cheat and peek at the Catacomb source code. RFC8032 is no help: it gives an unnecessarily complicated version, similar to the Ed25519 formula above.)

I’ll assume $v$ has order $(p - 1)/2$, so to invert I need to raise to the power $(p - 3)/2$ to find $1/v$. Then I raise to the power $(p + 3)/8$ to find the maybe-square-root of $1/v$ which might be off by a factor of $i$. But if I do these one after the other, then I can just multiply them modulo $(p - 1)/2$. That’s going to be a nuisance because of the divisions. But wait! These are all whole numbers. How do I know this? Because $p \equiv 5 \pmod{8}$. So let’s actually write $p = 8 t + 5$. Then the group of quadratic residues has order $4 t + 2$. To invert, I raise to the power $4 t + 1$; and to find the maybe-square-root, I raise to the power $t + 1$. Altogether, then, I want $m = (4 t + 1) (t + 1) = 4 t^2 + 5 t + 1 = t (4 t + 2) + 3 t + 1 \equiv 3 t + 1 = 3 (p - 5)/8 + 1 \pmod{4 t + 2}$. And I’m done.

[Edited 2017-09-20: Fix a typo in the description of Montgomery’s inversion trick.]

IPv6, Linux, and MTU fun

| No Comments

I think the breakage is more or less over, so it’s time to (a) enjoy the results, and (b) write up the war stories. I’m now running a mostly-IPv6 network.

Actually, the network was mostly-IPv6 to begin with. The main missing piece was DNS: I wasn’t advertising AAAA records for my hosts, even though they nearly all actually had IPv6 addresses. Getting things to actually use IPv6, then, mostly involved hacking on my unncesssarily complicated zone file generator (written in Common Lisp, natch).

At this point, everything went wrong. It turns out that if I breathe wrong, then Linux loses the routing table entries for its own local interface addresses. It turns out that I’d breathed wrong quite a lot before pulling the big DNS switch. I still don’t know what causes this; my current theory is that it’s BIRD, but I don’t have any convincing evidence (and I know that the obvious alternative — Quagga — is hopeless in my environment. I just have a cron job which checks that the interface routes haven’t vanished.

Then I found out that SSH connections from my laptop crybaby to a colocated (virtual) server stratocaster were wedging in a not-quite MTU-blackhole way when the server had a lot of stuff to say. Specifically, the first few large segments would get through, and then they’d dry up. (MTU problems on IPv6 are especially nasty, because IPv6 routers don’t fragment packets — only the sender is allowed to do that.)

The MTU bottleneck is the VPN hop between the colocated server (the endpoint is on precision) and the house (on radius). Capturing packets showed that I was right: stratocaster was sending a TCP segment that was too large, and being told this by precision. So stratocaster sent the data again, in two pieces — and then sent the next segment full size. After a while, precision gave up sending packet-too-big ICMP messages (some rate-limiting thing, presumably), and stratocaster continued throwing large TCP segments into the void, having forgotten the path MTU, and the connection jammed up.

I’ve never trusted the standard wireless security stuff since the old WEP disaster, so even when I’m at home, traffic between crybaby and the home servers goes over the VPN. To make all of this work efficiently, mobile devices like crybaby use a simple heuristic to decide which static VPN endpoint is likely to be best to associate with (radius in the house, otherwise precision), and that endpoint advertises a host route for the mobile device through my dynamic routing machinery.

Unfortunately, Linux has a bug, and won’t attach path MTU information to host routes! Simple solution: since IPv6 address space is cheap, allocate a little network to each mobile device. Rather than 2001:ba8:1d9:6000::{1,2,3}, we’ll use 2001:ba8:1d9:6000::{1,2,3}:0/112.

Yes, I realise that allocating …:0 to the mobile device was brave. Too brave: it didn’t work. In particular, now connections wedged in the other direction. I checked: this time nobody was actually sending packet-too-big messages. Eventually, I noticed that radius’s kernel was logging plaintive — if gnomic — messages of the form

2014-04-19T14:05:29+01:00 radius kernel: [363728.436583] icmpv6_send: acast source

Tracking down this message didn’t take too long. Apparently, it means that the sender (to whom we’d be sending an ICMPv6 error) has been identified as an anycast address — and for some reason we shouldn’t actually send it at all. It’s true that …:0/112 is defined to be the any-router anycast address, and radius could see that it was indeed a /112 network, so I can’t argue with that; but I think refusing to send ICMP back to an anycast address is rather poor.

Anyway, another renumbering ensued, and now the mobile devices (and my actual IPv6 anycast services) are on …:1/112. And IPv6 works.

SSL and TLS are broken in many exciting ways. I’m just going to focus on one of these: revocation.

Certificate authorities (‘CAs’) tend to issue certificates with fairly long validity periods — a year is common. What do they do if they find that a certificate is bad before the year is up?

None of these are really very clever, but they’re slightly creative abuses of the standard tools, and maybe someone will find them useful.

  1. Browsing lots of small files. Suppose you’re looking at something like a Qmail configuration directory or settings in /proc or /sys. where there are lots of one- or two-line files. You could try to get a handle on them by using cat(1) but I find that the right answer is

    grep ^ files

    For trees of stuff, like in /proc/sys/net/*/conf, say, GNU grep’s -r flag is useful.

  2. Clobbering files as root. I try to run as a non-root user most of the time, and elevate to root as needed, using sudo(8). (The vulnerability history of sudo makes for exciting reading, but I like that it maintains a log. I’m much less impressed with the security theatre of typing passwords, and its tendency to send loud SECURITY mail is very annoying.)

    So, there’s a file you want to hack: it’s owned by root, and the easiest way to hack it is with a shell rune. But if you redirect something’s stdout to the file, that’ll be done in your shell, with your privileges, so it won’t work. I use tee(1) here.

    | sudo tee output >/dev/null

    (This at least leaves a log that I hacked the file, and even if it doesn’t say exactly how, it’s no less informative than sudoedit.)

I’m sure there were going to be more of these. Maybe some of them will come to me later.