Robert Scheifler and James Gettys. The X window system. ACM Transactions on Graphics 5, 2 (April 1986) pages 79-109.
by J. H. Saltzer, February 28, 1996, with 1997 suggestions from Dawson Engler and Steve Ward. Minor update, February 11, 1999 and February 16 2002.
This paper is a very nice example of a "systems" paper in that it describes a real, working system and it explains the reasons choices were made from among design alternatives--it even tells when choices were arbitrary. In addition, it is written for the most part in plain English with a minimum of jargon.
Here is the complete citation:
Simson Garfinkel, Daniel Weise, and Steven Strassman. The UNIX-haters handbook. IDG Books Worldwide. 1994 ISBN 1-56884-203-1.
But keep in mind that our choice of this paper as a reading isn't based on whether X is the greatest window system, but rather for the system design issues the paper illuminates.
(This question is more subtle than it may appear: The authors set out design principles and follow them closely (Do it right school), but they compromise where it really helps and they explicitly released the system to the world with less-then-perfect properties in order to gain wide, rapid acceptance, planning to fix things up in a later release (Worse is better school.) This mixture of strategies reveals that the Gabriel paper actually presents a caricature with only the two extremes; real world designers almost always operate somewhere in the middle.)
- A device-independent display applications programming interface
- A client/server boundary between the application and the display
The point is that either one of these ideas can be done alone and has major benefits by itself. What are those benefits?
(1. Before X was available, Unix provided a platform-independent operating system interface for many things other than the display and keyboard [examples: the file system, i/o streams, virtual teletype, most supervisor services such as fork and clock reading]. But that was a crucial omission, because applications that wanted to use graphical display had to use a vendor-specific interface, and plan to rewrite the application once for every different display. X provided a display-independent interface, so for the first time, it was possible to write portable interactive Unix applications.)
(2. The client/server interface between the application and the display adds significant flexibility. For example, it becomes possible to write a display application that runs on a supercomputer, even though that supercomputer doesn't have any display of its own, or is located in Novosibirsk, or has an inadequate display. Or if I am having a problem with a bug I can tell my application to pop up a window on your workstation so that you can look at it and offer help. Or if you are developing an interactive application on your DEC workstation and you want to see how it performs on a Sun workstation elsewhere on campus you can see the result without leaving your office.)
The two benefits reinforce each other. How?
(When a new display becomes available, the device-independent API means that the only code that needs to be changed is the display driver. But how does an application get to use this new driver? If the operating system doesn't provide dynamic linking of applications to libraries, the application may have to be recompiled and relinked. A client/server interface between XLib and the display driver in effect provides a dynamic linking mechanism for systems that do not have that feature. UNIX--at least at the time--was a prime example.)
Example:
"Do you support text display?" Yes -> send text to screen. No --> beep in Morse code.
(The general model used by X is that all displays have the same feature list; it is the display driver's responsibility to emulate any missing features.)
(X: - emulated features may be very slow
- exotic features of some hardware displays may not be available
to any application because they aren't in the X feature list.
- new display innovation may be discouraged. A manufacturer may
be reluctant to add cost if no one will be able to use the feature
because the software doesn't allow access.)
(GKS - application becomes an impenetrable mass of cases and the benefit of device independence is largely lost. Worse, the application programmer may be encouraged to provide support for combinations of features that don't actually exist on any real display. Or discouraged, and provide support only for the one combination of features on the most readily available display, and the application won't interoperate with anything else.
The IBM System Network Architecture is a (very big) example of a design that fell by the wayside partly because of this problem. Although every IBM product that used telecommunications adhered to the SNA architectural standard, there were so many options in the standard that no two products could interoperate unless they had been specifically designed to do so.)
(The round-trip delays across the network would be a killer for highly interactive applications. With a stream, the next data element can begin its trek to the display as soon as the client has it ready, rather than waiting for a response from the display of the previous data element. The important observation is that in this particular application, the client usually has no need for acknowledgement that the previously requested item has actually been displayed. As long as the byte stream is reliable--a feature provided at a lower level--display is inevitable, which is all the client cares.)
What is wrong with having the server remember a log of requests (display list)? (Have to replay from the beginning of time. That can take a long time.)
What is wrong with having the server remember the bit map (uses up VM or RAM. N.B. when was this paper written? RAM is cheaper now, and X version 11 provides saveunders.)
Since the client has to be able to redraw anyway, in order to handle resize requests, what is wrong with having the client responsible for refresh? (Generates extra network traffic.)
<(Positive: Modularity--the client doesn't ever get involved in keeping track of the state of the actual display; the client is concerned only with the state of its windows. And if saveunders are implemented, when a window is exposed the server can immediately display it, rather than waiting for the window contents to be transmitted over the network.
Negative: An application that is streaming a lot of output to an obscured window is using up network capacity. If there are saveunders, it is also soaking up server cycles keeping the saveunders up to date. And if, as likely, the application repeatedly overwrites the previous contents of the obscured window, the transmission of those previous contents was a complete waste of resources.)
([Modularity, abstraction, hierarchy answers mostly based on suggestions by Dawson Engler]
Modularity (which are industrial-strength/enforced and which are soft?):
- explicitly guards against client failures with a
''stateless'' protocol - other systems that use this? (e.g.,
www, nfs). What types of servers do not? (e.g., in many
situations the OS: controls and knows all so can avoid this
problem)
- prevents clients from modifying display directly
- guards against client crashes (frees resources when connection
terminates)
- server separate address space (or even machine) from client
- communication only through IPC
- resources protected with ``hard to guess accidently'' names
- color map guarded by server
- client manages windows, server manages screen (see preceding
question on visibility)
Abstraction (XLib is a specified API. It has call return semantics; it looks a little like RPC but it isn't:
- windows
- device independence (including the semantics of the net protocol)
- fast to have 100s of windows: lets them be used as structuring
technique
- drawing utilities
- color map: indirection managed by server to remove need to
hard code RGB values in app.
- network transparent
Hierarchy (a bit shallow. sigh.):
- windows can be arranged hierarchically
- server/client form a two-layer hierarchy, sort of.
Layering
- Xlib is a layer that hides the network protocol behind a
procedure call interface.
- X server is a layer that hides the raw display behind a
network protocol interface)
(Because M. I. T. gave the code away. The project at CMU was sponsored by IBM, which owned rights to all software that the project developed, and IBM considered AWS to be a valuable asset. When the time came to show AFS to the world, CMU did so by inviting interested parties to visit Pittsburgh and sit down in front of a workstation. M. I. T. as a matter of policy doesn't sign contracts that grant exclusive rights to the sponsor, so when the time came to show the X Window System to the world, M. I. T. did so by placing the sources on a public FTP server and inviting interested parties to download and try it for themselves without royalties. Many people did so, liked what they saw, and very soon the X window system dominated.
The irony in this situation is that IBM apparently never realized any revenue from AWS, but M. I. T. became host to the X consortium, which paid funds to the M. I. T. overhead pool and also contributed an endowed faculty chair to M. I. T. By not requiring a royalty, M. I. T. realized a substantial long-term return.
It is sometimes useful to keep this anecdote in mind when engaging in discussions of the protection of intellectual property in computer networks.
* Downside of the mechanism/policy separation: coherence takes some form of legislation. Is Mac window management more coherent simply because it is wired into the OS?
* Application-specificity of division of labor.
* Cost of generality: Why is xclock so big?
* Imaging model tradeoffs: are there reasons why a window system should *not* adopt a device-independent graphics interface? Is it more difficult and/or slower? Are there underlying semantic problems?
(0. Provide two ports as described.
1. Require that all clients send data in little-endian order.
2. Require that all clients send data in server-preferred order.
3. first few bytes sent over the channel negotiate the byte order.
N.B.: X version 11 does it this third way.)
A table on the board showing the number of byte-swaps that happen for each of the four combinations of server and client byte-order in each design can help the class figure out what is going on.
Important: first paper to spend pages telling us where it went wrong. (be careful that the confessed faults do not hide the more serious ones) Lots of citations: pedantically aware of other systems. Interrupts vs Polling issues. Tension in X between functionality at clients and in servers. This is essentially the tension between centralization and decentralization. This is a venerable tension in complex systems: planned economies vs free market totalitarianism vs anarchy/democracy nuance of this: strong federal government vs federalism Pull out computer issues by looking at two extremes: clients attached to a server that performs all work and clients that contain server functionality (perhaps in libraries) and touch the display themselves. LibWin advantages: 1. clients can rewrite their libraries at will (important subpoint: can have multiple altered copies running concurrently) 2. no communication with server: IPC -> procedure call. (fast) 3. Address space is the same: this can increase usability (e.g., pointers can be passed between library and application) and be more efficient (e.g., can use same memory for screen objects: no double buffering) 4. No single point of failure: clients are not killed by server crash. Server advantages: protection 1. single entry and exit point (i.e., no wild jumps) 2. separate address space (firewall) = no wild writes. 3. centralized = trivial concurrency control 4. well-formed updates: server can guarantee that state modifications are done in a well-formed manner. 5. invariants: server can guarantee invariants hold. ease of implementation: 1. single point of update: improvement made in server -> all clients profit. Fix bugs in one place. 2. simple, low-tech dynamic linking: IPC/system calls can be used as poor man's dynamic linker. Allows improvements to be shared immediately w/out relinking 3. consolidation: state is in one place, allows it to be easily kept consistent (e.g., colormap) 4. decoupled birth/dead: state in server can easily persist across client's ------ (Note: in about a month we will return to the security issues raised in the next section) Downloading code is newly fashionable: Peter Deutsch did in early 70's (from Butler's paper on hints) recently made popular in OS community and, of course, on the Internet (Java). Why download code? (many reasons, here's two): 1. eliminate boundary crossings 2. move computation closer to the action Current example uses of downloaded code: kernel (packet filters), databases (postgres), window systems (news), networking (active networks) and Internet (Java) Potential use in X: - lower latency to move computation to the other side of the network straw - absorb communication so client never sees (e.g., mouse tracking). - send code to implement cool new functionality to server so everyone can use it rather than having to distribute to each client. What do you need to download code? 1. protection type-safety/preventing wild memory ops and jumps. How to do it? ( provided by trusted certifier using a safe language e.g., trusted compiler compiles code and signs it. Can provide protection by eliminating the ability to express dangerous things. For instance, to eliminate floating-point exceptions remove floating point. post-processor: e.g., inspect code (Deutsch, Java) or even modify to prevent it from performing wild memory operations and jumps. ) guarding against excessive runtime: (inspection, modification and/or interrupts) 2. execution context: points to pass control in/out of fragment and ability to create and store state. dynamic linking: bind unresolved symbols and determine where to call in 3. security: prevent from doing bad things (e.g., sending email in your name) (Hard to guarantee in a useful system.) --------------- Examples of performance hacks, and ease-of-implementation choices: Speed hacks: To get around a slow network: - semantic compression: send description of scene rather than pixels. Gives 10-100x compression. Useful technique in general: send constructive formula rather than object itself. - asynchronous communication - server does background refresh - pop-up menus can be overlaid with bitmap stored on server Slow CPU: - ports for big/little endian - shared memory for IPC on same server - clipping done by server (would have to be done anyway for protection) (Don't hide power) - client expose. Note that because interface carefully specified server can still handle exposes. - byte alignment of operations in messages - operating system calls minimized To guard against network: - ordering by sending multiple windows over same pipe Implementation simplicity: Simple: - use of a reliable byte stream - client expose - large images -> broken up by clients Portability via lowest-common denominator: - communication using simple protocol (reliable bytestream = TCP) - simple capabilities for device (don't assume powerful device) - simple dynamic linking using client/server rather than sophisticated linkers. ---- Security problem (perhaps better delayed till we get to this topic): 1. authentication via IP addresses 2. small integers used as resource capabilities