The other day, a colleague asked me how to multiplex lines from a number of asynchronous sources from a Unix shell script. Here’s what I came up with.
#! /bin/sh
set -e
for i; do
while read line; do echo "$i: $line"; done <"$i"&
done | cat
wait
This is fairly straightforward: for each filename given on the shell’s command line, run a subprocess which reads lines from that file and writes them (prefixed) to stdout. Because each file is dealt with in a separate process, it reads from any input that’s available, even if they’re named pipes, terminals, sockets or whatever.
And then we pipe everything through cat, which simply copies its input to its output again.
The newsgroup comp.unix.shell has a special place in its heart for UUOCs — useless uses of cat(1). People have a bizarre tendency to type
cat file | grep pattern
rather than the more obvious
grep pattern file
Oh, well. The cat(1) above is not useless. What’s it for? Actually, it really does just copy its standard input to standard output: the important bit is the pipe.
Pipes have a special guarantee: if you write(2) no more than PIPE_BUF bytes to a pipe in a single operation, then they won’t be interleaved with anything else even if other processes are trying to write at the same time. Without this, you will occasionally get bits of one line written in the middle of other lines in various unhelpful ways, which makes reading the output very hard. I tested this on Linux with ordinary disk files: interleaving is rare but it happens. Stick a cat(1) in, and it goes away.
The really observant among you will have realised that I’m making an unwarranted assumption that echo(1) issues a single call to write(2) in order to do its business. This, I’m afraid, isn’t guaranteed by POSIX (or any other standard I know of) but was experimentally determined using strace(1).
Leave a comment