The GCHQ challenge, unpicked

| No Comments

No, I didn’t solve it all. I did most of it, though, and one of the remaining bits is, I think, a bug in their server. There seem to be a fair few bugs, or at least rough edges.

The starting point is http://canyoucrackit.co.uk/ — if it’s still up by the time you read this. If not, I made a copy.

There’s a bunch of hex numbers. The first thing you notice is that they’re really inconveniently in a PNG image, rather than something you can actually work on conveniently. If you go poking around in the image, though (thanks, Joe), you find that there’s a mysterious base-64 encoded string in the header. My various tools don’t show it.

OK. So what do you do with this stuff? In my case, I just typed all the numbers in. That was a good idea, since I developed a feel for them. They’re definitely not random. There’s a 00010000, and later a 9dffffff which strongly suggest 32-bit little-endian; there are also ASCII strings AAAA (twice) and BBBB. This latter is interesting, since the other string, once you base-64 decode it, begins with BBBB. I’m not entirely sure why I tried disassembling it, but I did, and it indeed turned out to be 32-bit x86 code, and, specifically, Linux code (you can tell from the int 0x80 syscall).

I disassembled the code, and determined a number of things about it.

  • It implements something which looks an awful lot ilke RC4, using a trivial key.

  • It expects some stuff to be appended to it.

  • The extra stuff should start with BBBB, followed by a length word $n$, followed by $n$ bytes of actual ciphertext.

Once it’s done all of that, it exits immediately.

My first reaction was to decrypt the ciphertext using my own implementation of RC4. That produced hopeless mush. My next effort involved pasting the hex into nasm. For the purposes of this article, though, I’ve done a fully commented disassembly of the code. While I was doing that, I found the bug which meant that it didn’t actually implement RC4: it clobbers the $j$ register while it’s actually XORing the ciphertext and keystream bytes. (There’s another infelicity: if the code can’t find both of its special markers, it exits with status zero, though it ought really use a nonzero status. Some of the other instructions are encoded inefficiently, so there’s enough space to do this properly.)

You can get the raw binary image (with the tail-end bit attached), or an executable ELF image. (If you run random ELF binaries you download from the Internet then you get what you deserve.)

I assembled and linked it into an ELF binary, and ran it under GDB:

(gdb) b exit
Breakpoint 1 at 0x80480ea
(gdb) run
Starting program: /home/mdw/public-html/gchq/cyber

Breakpoint 1, 0x080480ea in exit ()
(gdb) p (char *)($esi - 50)
$1 = 0xbffff5be "GET /15b436de1f9107f3778aad525e5d0b20.js HTTP/1.1"
(gdb) quit

A link to the next bit! Go us!

The next bit turns out to be a Javascript file containing hardly any Javascript. What it does have, however, is an enormous pile of hex, and a big comment which looks like it’s specifying a virtual bytecode machine. It isn’t quite, though.

  • Firstly, it doesn’t actually explain how the code segment works.

  • Secondly, it doesn’t actually state when the op2 byte is omitted.

  • Thirdly, the description of the long jmp instructions show the new code segment being taken from the register named by op2, when in fact op2 is immediate data in this case.

My first move was to ignore the subtle hint to write an implementation of this machine in Javascript. Instead, what I did was to write a disassembler in Lisp. Once that was giving me plausible results, I examined the code. It looked like it was decrypting a different bit of code and then jumping to it. (This was when I realised that the jmp spec was wrong.)

At this point, I wrote an execution function too, so that it would stop when the program counter left particular bounds, and ran it to decrypt the next part. More decryption, and rather odd. It’s perhaps worth listing a disassembly here.

0100 : 32 00 : movr r2, #x00
0102 : 75 0c : add  ds, #x0c
0104 : 31 08 : movr r1, #x08
0106 : 33 32 : movr r3, #x32
0108 : 40 02 : movm r0, [ds:r2]
010a : 80 03 : xor  r0, r3
010c : 52 00 : movm [ds:r2], r0
010e : 72 01 : add  r2, #x01
0110 : 73 03 : add  r3, #x03
0112 : b2 00 : cmp  r2, #x00
0114 : c3    : jmpe r3
0115 : b0 00 : cmp  r0, #x00
0117 : 30 1b : movr r0, #x1b
0119 : c0    : jmpe r0
011a : 01    : jmp  r1
011b : ff    : hlt

The interesting thing here is that there are two conditions tested for: either if r2 is zero (indicating that we’ve wrapped around) or if r0 is zero (indicating that we found a zero plaintext byte). Only the latter seems to happen. The former causes a branch to r3, which must have wrapped around to 0x32 again. Interestingly, there’s plausible code there:

0132 : 75 10 : add  ds, #x10
0134 : 01    : jmp  r1

There’s also a stray nonsense byte at 0x0140 (which says jmp r12, which is obviously a silly register). By the time that the hlt is reached, regions 0x0100 to 0x014f and 0x01c0 to 0x1f3 are decrypted. As well as the curious bits of code, there are also regions of apparent junk between 0x0150 and 0x01bf, and from 0x0200 on to 0x02d6. I still have no idea what these mean, though they’re clearly not uniformly random.

You can see the resulting Lisp code, somewhat refined since my initial cut. As a special treat, and because they hinted so strongly, I also wrote an implementation in Javascript.

Anyway, what you do find, once you’ve done this, is a string at 0x01c0 which says

GET /da75370fe15c4148bd4ceec861fbdaa5.exe HTTP/1.0

Hooray, another link. This one gives you a Windows PE binary which depends on a couple of Cygwin DLLs. This thrilled me less. I ran it on a virtual machine, having taken a snapshot first. It complained with a usage message saying that it wanted a hostname on the command line. It complained about a missing license file. If I gave it a license.txt file like it wanted, then it said it was invalid. Time for some disassembly.

My Windows disassembly tools are not especially good: I found myself annotating objdump output in Emacs, and had to chase down the DLL linkage myself. The results aren’t pretty enough to publish here, but I ended up with a good idea of what was going on. I’ve written a translation back into C, but it’s not quite faithful (as we’ll see).

The program wants to find a license.txt file in the current directory, and be given a hostname on the command line. It reads a line using fscanf, and the carves the result into six 32-bit words. It checks the first three, as follows.

  • The first word is meant to be the string gchq.

  • The second and third words (actually, everything from offset 4 onwards, but nothing looks past byte 11) is passed to crypt(3) and checked against the known hash hqDTK7b8K2rvw.

If it’s happy about the first half of this, then it bundles up the remaining words (in hex), together with the crypt hash, into an HTTP request, and sends it to the hostname; it then prints the reply that it receives. It doesn’t take a genius to work out that you’re meant to use canyoucrackit.co.uk as the hostname.

Once the program is happy with the password, it doesn’t get used for anything else. In particular, it doesn’t get sent to the server. There’s a variable which I’ve named validp which is initialized to zero near the beginning of main. When crypt has done its thing, validp is set to 1. Then the remaining words are shuffled about, and validp is checked. With this is in mind, there are three ways I can think of for dealing with the crypt password check.

  • You can hack the program so that it doesn’t care. The easiest way to do this is to change the initialization of validp so that it starts out at 1.

  • The variable validp is at offset 44 from the start of the buffer into which the license.txt file is read. If the line is longer than this, you’ll overflow the buffer and clobber validp. As long as you clobber it with something nonzero, this suffices to pass the check. (My version of GCC doesn’t allocate variables in the same way, so I need a longer string if I compile the program from my C code.)

  • You can find the correct password using a brute-force search. I ran John the Ripper on it for a bit, and it told me that the password is actually cyberwin.

That leaves the remaining words. Now, if you’re cleverer than I was, you’ll have remembered that there was a bizarre 4-byte string which was jumped around at the beginning of the RC4-decryption code and didn’t seem to do anything useful, and there were two 32-bit numbers which also seemed useless in the Javascript program. If you stash these numbers in little-endian order in the license file, and run the program…

it gives you a generic 404 error telling you it doesn’t know what you’re talking about.

Actually, the program has a bug, or the server is misconfigured. The program doesn’t send any HTTP request headers, only the raw GET line. I suspect the server in question must be hosting several domains. Anyway, if you (say) feed the request line into a more useful HTTP client such as a web browser or curl(1), or you fix the program to provide a Host header, then you get the proper answer at last.

Pr0t3ct!on#cyber_security@12*12.2011+

All that remains is to feed that into the text box on the orignal page and say ‘I did it’. If you’re outside the UK, then it tells you that GCHQ aren’t interested in foreigners. If you’re in the UK, like I am, then you get a link to their usual recruitment page. This is much less interesting than the actual puzzle.

Edit 2011-12-05: Add missing line of disassembly so it actually makes some sense.

Leave a comment

About this Entry

This page contains a single entry by Mark Wooding published on December 3, 2011 12:58 AM.

Tracking down a troublesome bug was the previous entry in this blog.

More fun with sysadmin debugging is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.2.13