Unix, David Glasser

Stuck in the Shell: The Limitations of Unix Pipes

DAVID GLASSER

Taking the a Unix guru's “|” key is as crippling as taking away a Windows user's mouse. At the Unix shell, piping is the fundamental form of program combination: pipes connect the standard output and standard input of many small tools together in a virtual “pipeline” to solve problems much more sophisticated than any of the individual programs can deal with. In theory, stringing together command-line programs is only one use of the underlying Unix pipe system call, which simply creates one file descriptor to write data to and another to read it back; these descriptors can be shared with subprocesses spawned with fork. One might think that this very generic system call, which was essentially the only form of inter-process communication in early Unix, could be used in many ways, of which the original “connect processes linearly” is just one example. Unfortunately, pipes have several limitations, such as unidirectionality and a common-ancestor requirement, which prevent pipes from being more generally useful. In practice, the limitations of a “pipe” system call designed for command-line pipelines restrict its use as a general-purpose tool.

Unix pipes are inherently unidirectional channels from one process to another process, and cannot be easily turned into bidirectional or multicast channels. This restriction is exactly what is needed for a shell pipeline, but it makes pipes useless for more complex inter-process communication. The pipe system call creates a single “read end” and a single “write end”. Bidirectional communication can be simulated by creating a pair of pipes, but inconsistent buffering between the pair of pipes can often lead to deadlock, especially if the programmer only has control of the program on one end of the pipe. Programmers can attempt to use pipes as a multicast channel by sharing one read end between many child processes, but because all of the processes share a single descriptor, an extra buffering layer is needed in order for the children to all independently read the message. A manually maintained collection of many pipes is required for pipe-based multicast, and programming that takes much more effort.

Pipes can only be shared between processes with a common ancestor which anticipated the need for the pipe. This is no problem for a shell, which sees a list of programs and can set up all of their pipes at once. But this restriction prevents many useful forms of inter-process communication from being layered on top of pipes. Essentially, pipes are a form of combination, but not of abstraction - there is no way for a process to name a pipe that it (or an ancestor) did not directly create via pipe. Pipes cannot be used for clients to connect to long-running services. Processes cannot even open additional pipes to other processes that they already have a pipe to.

These limitations are not merely theoretical - they can be seen in practice by the fact that no major form of inter-process communication later developed in Unix is layered on top of pipe. After all, the usual way to respond to the concern that a feature of a system is too simple is to add a higher-level layer on top; for example, the fact that Unix pipes send raw, uninterpreted binary data and not high-level data structures can be fixed by wrapping pipes with functions which marshal your structures before putting them through the pipe. But the restriction of pipes to premeditated unidirectional communication between two processes cannot be fixed in this way. Several forms of inter-process communication, such as sockets, named pipes, and shared memory, have been created for Unix to overcome the drawbacks of pipes. None of them have been implemented as layers over pipes; all of them have required the creation of new primitive operations. In fact, the reverse is true - pipes could theoretically be implemented as a layer around sockets, which have grown up to be the backbone of the internet. But poor old pipes are still limited to solving the same problems in 2006 that they were in 1978.