6.033 Discussion Suggestions

6.033--Computer System Engineering

Suggestions for classroom discussion

Topic: David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, and James W. O'Toole, Jr. Semantic File Systems. ACM Thirteenth Symposium on Operating Systems Principles, in Operating Systems Review 25, 5 (December, 1991) pages 16-25.

By J. H. Saltzer, March 18, 1997.

Who are these guys? Does this paper differ in any way with the other papers we have read?

(First, it is rather more scholarly--note the extensive list of citations at the end. Also, it is very smoothly organized: The abstract tells you both the problem and their solution, the introduction tells you what is coming, the conclusions both summarize and they include real conclusions, and the paper itself leads the reader systematically through an introduction, relevant prior art, what their system does, how it does it, and some review of what users thought of it. If you write a technical paper about a system design, this is a good one to use as a model. Your design project should look like this, except that you don't need a results section.)

In what way have we shifted gears here? What does this paper have to do with naming?

(We are stretching the borders of naming systems here. They have managed to create a system that plugs in to the usual naming mechanics of UNIX. But it is closer to a database retrieval system, in which reference is done by queries, which are more general than names.)

What is the difference between a query and a name? Can't I say "name: alpha.txt"? (Yes, but you can also say "owner: Gifford" or "text: recording". That is what makes it more general. An object can have many properties, only one of which is its name. A query specifies one or more properties of the objects you are looking for.)

How does "ls /sfs/smith/ext:/txt/" differ from "ls /sfs/smith/*.txt/"?

(If all the files with extensions named ".txt" belonging to smith were stored in the single directory "/sfs/smith" you would get the same result. But if smith is actually the root of a hierarchy of subdirectories, then the result can be quite different: the first query would get you what appears to be a directory listing that contain every file with extension ".txt" located anywhere in Smith's hierarchy.) The second query would get you a directory listing that contains just those file with extension ".txt" located directly in /sfs/smith.)

Is that a "field virtual directory" or a "value virtual directory"? How can you tell the difference?

(We just listed a "value virtual directory". You can tell because it lists the names of real objects in the system; in this system "values" are the names of real files and directories.)

So what is a "field virtual directory"? What does it list if I say "ls /sfs/smith/ext:/"?

( The listing will show you all of the various extensions that are found anywhere in smith's directory hierarchy.e.g.,

     .txt
     .o
     .c
     .h
     .frame
     .mss
     .tex
     .etc

The difference is that a field virtual directory reports the range of field names, not the range of file names.)

How did they manage to get this crammed into UNIX without completely rewriting it?

(They took advantage of an already existing RPC interface that stands between the file system user and the part of the file system that resolves names--the NFS interface. By hijacking those RPC calls, they can return something that your UNIX honestly believes is a real directory.)

There is a mention of a "port". On page 20, second bullet in the right-hand column: "The file server process exports its NFS service using the same root file handle on a distinct port number." What is this "port" all about?

(Port is a demultiplexing idea, just like "type" in the Ethernet. When we send a network level packet we specify not just which IP address to get this packet to, but also which "port" at that address to deliver it to. A port is a name that is bound to a next-level-higher protocol. In this case, there is a port that is conventionally assigned to NFS service. The SFS people have adopted another port number and bound their their modified NFS service to this new port number.)

Does using a distinct port number for this service require that the client know about it?

(Yes. The NFS client is going to have to specify the port number to use for each server. The paper was vague on how that happens. Perhaps this is an argument that can be supplied in a mount command.)

What is a "root file handle"?

(This is a UNIX/NFS concept. Recall that if you are going to look up a path name you need a starting point: where is the root? When you mount a file system the mount procedure ends by returning one value: the name to use for the root. It is called the "root file handle".)

Is the concept of a semantic file system mostly useful when typing commands and doing one-shot shell scripts where the user is available to interactively examine the consequence? Or is there a real use for this feature in writing C programs?

(The issue here is predictability. A programmer might try to embed a statement such as 'open("/sfs/smith/ext:/.txt/foo.txt")' on the theory that this will track down foo.txt no matter where Smith has decided to move it. But what if Smith has two files named foo.txt? The paper doesnt' mention this possibility anywhere. The point is that search by property generally can lead to zero, one, or several items in the result set, with no special concept of being "wrong" if it isn't exactly one. Specifying a name is usually intended to identifies exactly one thing, with the response "not found" being interpreted as an error. In interactive situations the user can look at the result set and decide what to do about it. But a programer who has embedded a path name isn't around to do answer questions.)

But do programs ever embed path names?

(Only very rarely; many programs don't. They take in an argument (that may have come from a typed command) and they pass it along to the file system. Thus "lpr /sfs/smith/ext:/.txt/foo.txt" is probably supplied interactively, and a response of "not found" or getting two things printed probably won't surprise anyone.)

Comments and suggestions: Saltzer@mit.edu