The standard technique for writing files safely on Unix are presumably well known: you write to a temporary file alongside your target and then, when you’re finished, rename the temporary file into place. Writing code to do this is pretty straightforward, but it’s also rather dull and exactly the sort of thing which ought to be automated.
I have some standard stuff in Common Lisp which does the right thing. It’s called safely. Since it’s written in Lisp, it does more than strictly necessary, because it’s easy to go the extra distance. The package has two layers. The lower layer implements a safely data type which keeps track of file system changes and knows how to wind them backwards and forwards. It provides features for writing new files, deleting old files, committing these changes, and unwinding the whole lot. It’s safe even in the presence of multiple concurrent writers (in the sense that you get the output of one of them, chosen nondeterministically). The upper layer provides some convenient macros for doing this sort of stuff, and in particular, the safe-write macro: you give it a filename, a variable, and a body of code; it binds the variable to a an stream, and if your body completes normally, it replaces the file with the stuff you wrote to the stream; if your body jumps nonlocally (because it throws, restarts, goes, returns or uses some other of Lisp’s many non-local flow-control operators) then it throws away your stuff without changing anything else.
This is all very excessive, of course, but it’s done for a reason. It’s actually descended from a Tcl program called splitconf, which takes an INI-style configuration file and writes each of its sections out to a separate file. (It’s a bit more complicated than just that. It keeps track of which files it thinks it’s looking after, and deletes ones which are no longer interesting, and it lets you run commands before and after it does its main thing. I mainly use it for looking after qmail settings.) Anyway, splitconf wants to make sure that the change-over is as smooth as possible, and needs to be able to back out if it discovers that it can’t write some particular output file.
Anyway, this isn’t really about how clever I am. (Honest.) Write-to-temporary-file-and-rename is a very common idiom, particularly under Unix where rename(2) actually has sensible semantics. And ‘idiom’ really means ‘opportunity for abstraction’. I’m rather surprised that appropriate abstractions for safely writing files aren’t provided by, say, Perl or Python.
Perl has IO::AtomicFile. This is a pretty good stab at doing the right thing (though unlike safely or splitconf it doesn’t cope with updating multiple files simultaneously). It always uses the same temporary file name for writing, which means that concurrent writing will produce broken output. But its worst feature is that implicitly closing the file (by dropping the last reference to the file handle) commits the file rather than reverting the change. This does not seem like the right way to write robust programs. (But, then again, Perl isn’t the right way to write robust programs…)
Python is usually quite good on robustness. But I find that the closest thing its ‘batteries-included’ standard library provides is the fileinput module’s iterator, which is really for processing multiple files from the command-line (in the style of cat(1) and many similar tools) though it has a modify-in-place facility. But fileinput is hopelessly broken:
it moves the original file aside before you start writing;
you write directly to the output file;
the ‘aside’ place for the input while you’re reading it always has the same name.
So: an exception or signal part-way through leaves corrupted files. Other processes will see partially-written files while you’re writing. And, worst of all, two concurrent writers will corrupt the output file and its backup!
I’m unimpressed.
Leave a comment