OCSP stapling, and other stupid ideas

SSL and TLS are broken in many exciting ways. I’m just going to focus on one of these: revocation.

Certificate authorities (‘CAs’) tend to issue certificates with fairly long validity periods — a year is common. What do they do if they find that a certificate is bad before the year is up?

Certificate Revocation Lists

The initial idea was that the CA would publish a certificate revocation list (a ‘CRL’), listing the bad certificates. Each certificate contains a pointer (usually a URL) to its CRL. A client is expected to fetch this CRL, and see whether the certificate’s serial number is in it. This goes wrong in a variety of ways.

The most obvious thing is that this is an active step which clients must take. To fetch a CRL, a client has to have (at least) an HTTP client, which is a bunch of effort, and OpenSSL (for example) doesn’t do this automatically. (Just for fun, CRL ‘distribution points’ can be more exotic than this, involving things like LDAP.) Unsurprisingly, many simple programs cheat, and expect someone to have already fetched an appropriate CRL and put it somewhere convenient. Chances are, nobody bothered.
Fetching things from servers goes wrong. Servers are sometimes down; networks are occasionally flaky. What do you do if you can’t fetch the CRL? Among your bad options are: (a) shrug, and say it’s probably OK; (b) give the user an annoying dialogue box and wait for them to press the give-me-the-damned-page-already button; or (c) fail secure, and refuse to provide the page, even though it worked yesterday and will work again tomorrow.
The CRL distribution point is in each ‘subject’ certificate. If the adversary somehow manages to sign a freely chosen certificate, they get to specify any CRL they like in it. That leaves only the nuclear option of revoking the ‘issuer’ certificate, and thereby every subject certificate the issuer has ever signed. Funnily enough, CAs are reluctant to do this.
CRLs can be out of date. They can get kind of big, so you don’t want to have to fetch a new one just on the offchance that it’s changed recently. So you check a stale one instead. (This one isn’t very convincing, really. The HTTP If-Modified-Since header works pretty well for this sort of thing.
Fetching a CRL takes time, which increases the time it takes to set up a TLS connection.

Online Certificate Status Protocol

For some reason, it was concluded that the problem with certificate revocation lists was that they might be a bit stale. The solution is the Online Certificate Status Protocol. This would be a simple protocol, except that all of the messages are encoded using the hideous ASN.1 Basic Encoding Rules.

The idea is straightforward. The client sends a request which identifies a particular certificate. The server sends back a signed response saying whether the certificate is still OK.

Amazingly, almost every detail of this protocol is wrong.

The request identifies the certificate in question using hashes of the issuer name and public key, and the certificate serial number. This doesn’t really provide a lot of information to the OCSP responder to go on if the serial number is unknown. It also causes trouble if an issuer signs more than one certificate with the same serial number. (The DigiNotar compromise was detected because the OCSP responder was receiving a lot of queries for unknown certificates. I can only assume that the hacker was only marginally more competent than the CA.)
The client gets the OCSP responder address from the subject’s certificate, just like the CRL address. Again, the fact that the DigiNotar hacker neglected to modify (or, better, elide) the OCSP location from the bogus certificates he issued indicates that he was severely lacking in ability.
The protocol has three possible non-error responses: good, meaning the certificate is still valid; revoked, meaning the certificate is bad and has been revoked); and unknown, meaning the responder doesn’t know anything about the certificate. This last response type apparently means ‘maybe it’s very new and I haven’t been told about it yet’, rather than ‘the existence of this certificate suggests that we’ve been compromised: please mail it to us and we’ll panic appropriately’.
The protocol also has a variety of unsuccessful responses, which indicate that the server was unable to produce a proper answer. These are unauthenticated. Like with CRLs, these leave the client in an awkward situation.
The responder has to sign its response there and then, which (a) used to require a fair amount of processing for the public key cryptography, and (b) an important signing key needs to be permanently available to a server which is connected to the public internet. This presents an interesting operational security challenge.
The client contacts an OCSP responder and tells it which certificate it wants to check. This is an obvious privacy leak. The OCSP responder can easily tell when it’s been asked for the status of the certificate for, say, goatporn.com, and send lists of client IP addresses to interested parties.
This is yet another exciting pile of protocol that clients must implement. It doesn’t, alas, happen by magic. Simple clients probably don’t bother, and they just lose.
Like fetching a CRL, the OCSP conversation makes TLS connection setup take longer.

OCSP stapling

Apparently the poor privacy and latency properties were the big problems with OCSP. The proposed solution is OCSP stapling. The idea is that, rather than have the client do it, the server makes an OCSP request and sends its response to the client as part of the TLS handshake (i.e., it ‘staples’ the OCSP response to the rest of the handshake messages).

Of course, this is yet more rank idiocy.

It’s yet another new, shiny thing that clients must know how to ask for, and servers must know how to provide. It provides no security benefit to existing software: everyone must upgrade.
The server can claim not to support OCSP stapling. In particular, a dishonest server would be crazy to offer support. To be fair, the RFC considers this, but doesn’t provide any especially useful suggestions: it says

a client that requires OCSP validation of certificates SHOULD either contact the OCSP server directly or abort the handshake.

But typically a client doesn’t know whether it requires OCSP validation (there’s no defined way of saying in a certificate that the server should support stapling), and the fallback position is the same as before, which was awful.

What should we do instead?

I mentioned the big mistake right at the top of this article. It’s the huge validity period for certificates.

I can see why you’d want to buy or sell certification in relatively large time units. But there’s no especial reason that the validity period of an individual certificate has to match the subject’s subscription period. For example, the issuer could sign certificates which are valid for a single day (plus a few hours’ slop to allow for clock skew and propagation delays), every day until the subject’s subscription expires.

The CA can make its freshly issued certificates available from a webserver. This server doesn’t need to do anything complicated: it only needs to be able to serve static content and be reliable. Certificates don’t need to be kept secret, since they’ll be transmitted to clients in the clear anyway; and they’re self-authenticating, so they don’t need special care to prevent malicious modification.

This solves almost all of the problems listed above.

Clients don’t need any modification to benefit. They already know how to check certificate validity periods. OpenSSL does it more or less automatically.
Servers don’t need much fiddling with. Most can be prodded into reloading their certificates; some particularly poor ones might need to be restarted but this takes at most a second or so, and fixing this is probably easy. The remaining thing to do is fetch the certificate for the day, which is a trivial shell script to call curl(1), called from cron(8). (There’s a little work to be done to retry failed attempts to fetch the day’s certificate, but that’s not especially hard.)
There’s no privacy problem, because the client just communicates directly to the server. There’s no latency penalty because clients don’t need to look anywhere else to find certificate validity information.
This scheme fails secure. All a CA needs to do in order to revoke a subject is stop issuing daily certificates for it. There are no temporary failure conditions where the client needs to distinguish ordinary network flakiness from enemy action.
It probably requires less computational capacity at the CA than running an OCSP responder. OCSP requires a signature each time a client makes a request. For idle servers, there will be hardly any requests, probably much fewer than one per day; but for busy ones, there’ll be lots. And requests probably don’t come in evenly across the day, either: the responder has to be able to cope with the peak demand. On the other hand, signing a certificate per day requires one signature per server, come what may — and the CA can arrange to spread this effort evenly across the day in whatever way it likes. There’s no lumpy and unpredictable demand to deal with.
The signing can be done by machines which aren’t directly attached to the public internet, so operational security is easier; instead, they just need to be able to push freshly signed certificates to the webserver, which is a nicely tractable one-way trust relationship. They still probably need to have keys permanently available, which isn’t completely ideal, but there are straightforward ways to mitigate this. CAs aren’t averse to buying quite fancy hardware security modules for storing keys, and this is just the sort of thing fancy HSMs are pretty good at.
This doesn’t need any certificate extensions which might be stored in the subject certificate rather than the issuer certificate. In fact, it doesn’t need any certificate extensions at all.

Not all is rosy. For example, this proposal doesn’t do much for the case where an adversary is able to sign arbitrary certificates. One mitigation would be to invent a certificate extension which lets an issuer declare, in its own certificate, that it only issues subject certificates with a given maximum lifetime: clients could then reject certificates apparently from that issuer which have a longer validity period. (Obviously we’re back to supporting only new improved clients here.) There are also ways of using fancy HSMs which could ensure that they only issue certificates with particular properties, e.g., a maximum lifetime.

I think the mechanics for building a proper certificate-a-day CA are pretty straightforward, but I’m not going to spoil the fun for anyone who wants to try thinking about this stuff for themselves. But I do run a toy CA for my own systems which uses this approach; it’s free software, and the code available from my Git repository.

OCSP stapling, and other stupid ideas

Certificate Revocation Lists

Online Certificate Status Protocol

OCSP stapling

What should we do instead?

Categories:

Tags:

Leave a comment

Search

About this Entry

Categories

Monthly Archives

Pages