6.033 Discussion Suggestions (Scheifler paper)

6.033--Computer System Engineering

Suggestions for classroom discussion of:

Robert Scheifler and James Gettys. The X window system. ACM Transactions on Graphics 5, 2 (April 1986) pages 79-109.

by J. H. Saltzer, February 28, 1996, with 1997 suggestions from Dawson Engler and Steve Ward. Minor update, February 11, 1999 and February 16 2002.

This paper is a very nice example of a "systems" paper in that it describes a real, working system and it explains the reasons choices were made from among design alternatives--it even tells when choices were arbitrary. In addition, it is written for the most part in plain English with a minimum of jargon.

Not everyone thinks that X windows is great. In fact there is a rather intense flame on the subject to be found in chapter 7 of the UNIX-haters handbook. You need the book to appreciate the full context, but you can get the flavor from an on-line excerpt. (alternate source with less content.)
Here is the complete citation:
Simson Garfinkel, Daniel Weise, and Steven Strassman. The UNIX-haters handbook. IDG Books Worldwide. 1994 ISBN 1-56884-203-1.
But keep in mind that our choice of this paper as a reading isn't based on whether X is the greatest window system, but rather for the system design issues the paper illuminates.
Is the X Window System an example of the New Jersey ("Worse is better") or the M. I. T. ("Do it right") school of system design? Give concrete examples to support your claim.
(This question is more subtle than it may appear: The authors set out design principles and follow them closely (Do it right school), but they compromise where it really helps and they explicitly released the system to the world with less-then-perfect properties in order to gain wide, rapid acceptance, planning to fix things up in a later release (Worse is better school.) This mixture of strategies reveals that the Gabriel paper actually presents a caricature with only the two extremes; real world designers almost always operate somewhere in the middle.)
An important goal of discussion is to get beyond the buzzword level and tease apart the two major, orthogonal contributions of the X window system:
- A device-independent display applications programming interface
- A client/server boundary between the application and the display
The point is that either one of these ideas can be done alone and has major benefits by itself. What are those benefits?
(1. Before X was available, Unix provided a platform-independent operating system interface for many things other than the display and keyboard [examples: the file system, i/o streams, virtual teletype, most supervisor services such as fork and clock reading]. But that was a crucial omission, because applications that wanted to use graphical display had to use a vendor-specific interface, and plan to rewrite the application once for every different display. X provided a display-independent interface, so for the first time, it was possible to write portable interactive Unix applications.)
(2. The client/server interface between the application and the display adds significant flexibility. For example, it becomes possible to write a display application that runs on a supercomputer, even though that supercomputer doesn't have any display of its own, or is located in Novosibirsk, or has an inadequate display. Or if I am having a problem with a bug I can tell my application to pop up a window on your workstation so that you can look at it and offer help. Or if you are developing an interactive application on your DEC workstation and you want to see how it performs on a Sun workstation elsewhere on campus you can see the result without leaving your office.)
The two benefits reinforce each other. How?
(When a new display becomes available, the device-independent API means that the only code that needs to be changed is the display driver. But how does an application get to use this new driver? If the operating system doesn't provide dynamic linking of applications to libraries, the application may have to be recompiled and relinked. A client/server interface between XLib and the display driver in effect provides a dynamic linking mechanism for systems that do not have that feature. UNIX--at least at the time--was a prime example.)
What does "GKS-style inquire operation" on page 82, item 2, mean? (In the Graphics Kernel System, a predecessor that tried to provide a device-independent display applications programming interface, the general model was that the user had to ask GKS what features were available on the display, and then make calls to GKS that use just those features.
Example:
```
    "Do you support text display?"
        Yes -> send text to screen.
        No  --> beep in Morse code.
```
The crucial thing here is not the inquire interface itself, but rather that the application can't assume anything about the capabilities of the display. Instead, it must ask, and be prepared to cope with whatever answer it gets back. What is the contrasting situation for the X Window System?
(The general model used by X is that all displays have the same feature list; it is the display driver's responsibility to emulate any missing features.)
- What are the consequences of using these two models?
  (X: - emulated features may be very slow
  - exotic features of some hardware displays may not be available to any application because they aren't in the X feature list.
  - new display innovation may be discouraged. A manufacturer may be reluctant to add cost if no one will be able to use the feature because the software doesn't allow access.)
  (GKS - application becomes an impenetrable mass of cases and the benefit of device independence is largely lost. Worse, the application programmer may be encouraged to provide support for combinations of features that don't actually exist on any real display. Or discouraged, and provide support only for the one combination of features on the most readily available display, and the application won't interoperate with anything else.
  The IBM System Network Architecture is a (very big) example of a design that fell by the wayside partly because of this problem. Although every IBM product that used telecommunications adhered to the SNA architectural standard, there were so many options in the standard that no two products could interoperate unless they had been specifically designed to do so.)
- Was X successful at providing device-independence? (This is actually a topic for a display class, not a systems class. But there are some iceberg tips to point out: Because all object size and position is specified in pixels, drawing a square is hard: you have to know whether or not your display has the same pixel density in the X and Y directions. Lesson: creating a truly device-independent interface takes quite a bit of experience and may require several tries to get it right.)
Why does X use a stream rather than RPC in the interface between client and server? (On page 87.)
(The round-trip delays across the network would be a killer for highly interactive applications. With a stream, the next data element can begin its trek to the display as soon as the client has it ready, rather than waiting for a response from the display of the previous data element. The important observation is that in this particular application, the client usually has no need for acknowledgement that the previously requested item has actually been displayed. As long as the byte stream is reliable--a feature provided at a lower level--display is inevitable, which is all the client cares.)
What exactly is the "race condition" referred to on pages 89-90 that is solved by using relative coordinates? What would go wrong if absolute coordinates were used? (If absolute coordinates were used, a request to draw a line from X to Y in a window would contain parameters that depend on where the window is. If the user is in the process of moving that window, there would have to be coordination between the window manager, which does the move, and the application, which may be continuing to send display requests to the window. Using relative coordinates eliminates the need for that coordination.)
On page 91 there is a nice example of the concept of a design target. How many windows does this design assume will be in simultaneous use? (Earlier it was apparent that the designers felt that numbers in the tens were too constraining. Here they indicate that numbers in the 1000's are probably not necessary. They have used a design center of 100's.)
The discussion of saveunder (server remembers enough data that it can refresh windows when they are exposed) on pages 95/96 illustrates tradeoffs in design. Note that they didn't use that term.
What is wrong with having the server remember a log of requests (display list)? (Have to replay from the beginning of time. That can take a long time.)
What is wrong with having the server remember the bit map (uses up VM or RAM. N.B. when was this paper written? RAM is cheaper now, and X version 11 provides saveunders.)
Since the client has to be able to redraw anyway, in order to handle resize requests, what is wrong with having the client responsible for refresh? (Generates extra network traffic.)
[suggested by Emmett Witchel] "In X, clients are never informed of what regions are obscured, only of what regions have become visible. Thus, clients have insufficient information for optimizing output by only drawing to visible regions." What are the consequences of this decision?
<(Positive: Modularity--the client doesn't ever get involved in keeping track of the state of the actual display; the client is concerned only with the state of its windows. And if saveunders are implemented, when a window is exposed the server can immediately display it, rather than waiting for the window contents to be transmitted over the network.
Negative: An application that is streaming a lot of output to an obscured window is using up network capacity. If there are saveunders, it is also soaking up server cycles keeping the saveunders up to date. And if, as likely, the application repeatedly overwrites the previous contents of the obscured window, the transmission of those previous contents was a complete waste of resources.)
General-purpose question: Where did the designers use (or violate):
- modularity
- abstraction
- hierarchy
- layering
([Modularity, abstraction, hierarchy answers mostly based on suggestions by Dawson Engler]
Modularity (which are industrial-strength/enforced and which are soft?):
- explicitly guards against client failures with a ''stateless'' protocol - other systems that use this? (e.g., www, nfs). What types of servers do not? (e.g., in many situations the OS: controls and knows all so can avoid this problem)
- prevents clients from modifying display directly
- guards against client crashes (frees resources when connection terminates)
- server separate address space (or even machine) from client
- communication only through IPC
- resources protected with ``hard to guess accidently'' names
- color map guarded by server
- client manages windows, server manages screen (see preceding question on visibility)
Abstraction (XLib is a specified API. It has call return semantics; it looks a little like RPC but it isn't:
- windows
- device independence (including the semantics of the net protocol)
- fast to have 100s of windows: lets them be used as structuring technique
- drawing utilities
- color map: indirection managed by server to remove need to hard code RGB values in app.
- network transparent
Hierarchy (a bit shallow. sigh.):
- windows can be arranged hierarchically
- server/client form a two-layer hierarchy, sort of.
Layering
- Xlib is a layer that hides the network protocol behind a procedure call interface.
- X server is a layer that hides the raw display behind a network protocol interface)
[Suggested by David Tennenhouse] Give examples of the separation of mechanism and policy. (The window manager is a client. Input focus can be either real-estate oriented or listener oriented. Window decoration is the responsibility of the window manager.)
[Suggested by David Tennenhouse] Think about the X-server's role as a trusted intermediary. To what extent is server software trusted by its clients? How could bogus software interfere with the safe and reliable operation of a system? Can you suggest ways in which clients could protect themselves? Some designers have suggested that clients should be allowed to "download" their own program code into remote X servers, and Sun's NEWS design actually provided such an interface. Could you safely implement such a scheme? How?
A somewhat off-topic aside that is interesting to pursue if there is some extra time: In 1986, it was generally agreed by window system gurus that the Andrew Window System developed at Carnegie-Mellon University and Sun's NEWS were both technically superior to the X Window system. Why, then, is the X Window System in use today on practically every engineering workstation in the universe, and no one has ever heard of either AWS or NEWS?
(Because M. I. T. gave the code away. The project at CMU was sponsored by IBM, which owned rights to all software that the project developed, and IBM considered AWS to be a valuable asset. When the time came to show AFS to the world, CMU did so by inviting interested parties to visit Pittsburgh and sit down in front of a workstation. M. I. T. as a matter of policy doesn't sign contracts that grant exclusive rights to the sponsor, so when the time came to show the X Window System to the world, M. I. T. did so by placing the sources on a public FTP server and inviting interested parties to download and try it for themselves without royalties. Many people did so, liked what they saw, and very soon the X window system dominated.
The irony in this situation is that IBM apparently never realized any revenue from AWS, but M. I. T. became host to the X consortium, which paid funds to the M. I. T. overhead pool and also contributed an endowed faculty chair to M. I. T. By not requiring a royalty, M. I. T. realized a substantial long-term return.
It is sometimes useful to keep this anecdote in mind when engaging in discussions of the protection of intellectual property in computer networks.
After reading the flame about X in the UNIX-haters handbook, Steve Ward extracted several points that are worth discussing:
* Downside of the mechanism/policy separation: coherence takes some form of legislation. Is Mac window management more coherent simply because it is wired into the OS?
* Application-specificity of division of labor.
* Cost of generality: Why is xclock so big?
* Imaging model tradeoffs: are there reasons why a window system should *not* adopt a device-independent graphics interface? Is it more difficult and/or slower? Are there underlying semantic problems?
(This is more appropriate to bring up later, when network protocols are the topic of discussion.) What is the significance of providing two client ports, one for big-endian and the other for little-endian clients as described on page 86? What are the design alternatives?
(0. Provide two ports as described.
1. Require that all clients send data in little-endian order.
2. Require that all clients send data in server-preferred order.
3. first few bytes sent over the channel negotiate the byte order. N.B.: X version 11 does it this third way.)
A table on the board showing the number of byte-swaps that happen for each of the four combinations of server and client byte-order in each design can help the class figure out what is going on.

The following discussion ideas were contributed by Dawson Engler. Some of these ideas are previews of issues that will be explored in more depth later on in 6.033.

Important: first paper to spend pages telling us where it went wrong.
        (be careful that the confessed faults do not hide the more
                serious ones)
    Lots of citations: pedantically aware of other systems.

Interrupts vs Polling issues.

Tension in X between functionality at clients and in servers.  This is
essentially the tension between centralization and decentralization.
This is a venerable tension in complex systems:
        planned economies vs free market
        totalitarianism vs anarchy/democracy
              nuance of this:
                strong federal government vs federalism

Pull out computer issues by looking at two extremes: clients attached
to a server that performs all work and clients that contain server
functionality (perhaps in libraries) and touch the display themselves.

LibWin advantages:
        1. clients can rewrite their libraries at will
           (important subpoint: can have multiple altered copies
                running concurrently)
        2. no communication with server: IPC -> procedure call.
            (fast)
        3. Address space is the same: this can increase usability
        (e.g., pointers can be passed between library and application)
        and be more efficient (e.g., can use same memory for screen
        objects: no double buffering)

        4. No single point of failure: clients are not killed by
        server crash.

Server advantages:
    protection
        1. single entry and exit point (i.e., no wild jumps)
        2. separate address space (firewall) = no wild writes.
        3. centralized = trivial concurrency control
        4. well-formed updates: server can guarantee that state
        modifications are done in a well-formed manner.
        5. invariants: server can guarantee invariants hold.
    ease of implementation:
        1. single point of update: improvement made in server ->
                all clients profit.   Fix bugs in one place.
        2. simple, low-tech dynamic linking: IPC/system calls
                can be used as poor man's dynamic linker.  Allows
                improvements to be shared immediately w/out
                relinking
        3. consolidation:  state is in one place, allows it to
                be easily kept consistent (e.g., colormap)
        4. decoupled birth/dead: state in server can easily persist
                across client's

------ (Note:  in about a month we will return to the security issues
               raised in the next section)

Downloading code is newly fashionable:

        Peter Deutsch did in early 70's (from Butler's paper on hints)
        recently made popular in OS community and, of course,
        on the Internet (Java).

  Why download code?  (many reasons, here's two):
        1. eliminate boundary crossings
        2. move computation closer to the action

  Current example uses of downloaded code: kernel (packet filters),
        databases (postgres), window systems (news), networking
        (active networks)  and Internet (Java)

        Potential use in X:
          - lower latency to move computation to the other side of the
                network straw
          - absorb communication so client never sees (e.g., mouse
                tracking).
          - send code to implement cool new functionality to server
            so everyone can use it rather than having to distribute
            to each client.

  What do you need to download code?
        1. protection
            type-safety/preventing wild memory ops and jumps.  How to
            do it? (
                provided by trusted certifier using a safe language
                        e.g., trusted compiler compiles code and signs
                        it.  Can provide protection by eliminating
                        the ability to express dangerous things.  For
                        instance, to eliminate floating-point exceptions
                        remove floating point.
                post-processor:
                        e.g., inspect code (Deutsch, Java) or even
                        modify to prevent it from performing wild
                        memory operations and jumps.
                )
            guarding against excessive runtime:
                (inspection, modification and/or interrupts)

        2. execution context: points to pass control in/out of fragment
        and ability to create and store state.
                dynamic linking: bind unresolved symbols and
                determine where to call in

        3. security: prevent from doing bad things (e.g., sending email
                in your name)

                (Hard to guarantee in a useful system.)

---------------
Examples of  performance hacks, and ease-of-implementation choices:

    Speed hacks:
        To get around a slow network:
                - semantic compression: send description of scene
                rather than pixels.  Gives 10-100x compression.
                Useful technique in general: send constructive formula
                rather than object itself.

                - asynchronous communication
                - server does background refresh
                - pop-up menus can be overlaid with bitmap
                stored on server

        Slow CPU:
                - ports for big/little endian
                - shared memory for IPC on same server
                - clipping done by server (would have to
                        be done anyway for protection)
                        (Don't hide power)
                - client expose.  Note that because interface
                carefully specified server can still handle
                exposes.
                - byte alignment of operations in messages
                - operating system calls minimized

        To guard against network:
                - ordering by sending multiple windows over same
                pipe

    Implementation simplicity:
      Simple:
        - use of a reliable byte stream
        - client expose
        - large images -> broken up by clients

      Portability via lowest-common denominator:
        - communication using simple protocol (reliable bytestream =
                TCP)
        - simple capabilities for device (don't assume powerful
                device)
        - simple dynamic linking using client/server rather than
                sophisticated linkers.
----

Security problem (perhaps better delayed till we get to this topic):
        1. authentication via IP addresses
        2. small integers used as resource capabilities

Comments and suggestions: Saltzer@mit.edu