Tim Berners-Lee, Robert Cailiau, Ari Luotonen, Henrik Frystyk Nielsen, and Arthur Secret. The world-wide web. Communications of the ACM 37, 8 (August, 1994) pages 76-82.
by J. H. Saltzer, March 17, 1996
1. Who wrote this paper? What can we learn by studying the citations, acknowledgement, and general environment surrounding this paper? (author's affiliations: CERN, a high-energy physics outfit, not a computer research or development organization. citations: none? Actually, There are some citations buried in the glossary. Acknowledgement: none!) 2. Given this background and environment, what might we expect when compared with papers written, say, by authors from DEC SRC? (a. An unexpected viewpoint, orthogonal thinking, novel ideas that might not have been thought of by people who have spent years learning about computer systems from people who have spent years working with computer systems. b. Very little contribution from the world of computer systems and computer science; reinvention of well-known mistakes.) 3. So where are the examples of novel ideas? (see item 5 below) (the integrating and encompassing strategy of having a protocol specified as part of the URL [http, ftp, telnet, gopher, etc.] is not only novel, but one of the things that gives the WWW its power. One browser can integrate all of the previous mechanisms.) 4. And how about some examples of mistake reinvention? (See 8, below.) 5. There have been several web-like ideas developed over the last few years. For example, gopher. How does WWW compare with gopher? - Gopher has architected directories, like a file system. Information-storing objects can't contain links. It looks like it was designed by someone who was too familiar with file systems and couldn't break out of that way of thinking. Gopher is what you get if you ask a collection of Unix gurus to design a world-wide web. - WWW allows any document to serve both as an information repository and as a directory with links. This isn't the model of the usual file system, so a computer systems person probably wouldn't have thought of it. 6. Two ideas, one mechanism: - organize a single document as a set of linked pages. - link independent documents. 7. What is a *stateless* protocol? Why does it matter? 8. Closures in the World-Wide Web. Review design project 1, Spring 1995 (handout 9), and the solutions to design project 1 (handout 22). The following paragraphs are mildly revised versions of paragraphs from the solution handout. Technical description of the problem, in 6.033 terms: The Web browser supplies an implicit closure for relative names (also called "partial URL's") found in Web pages. The implicit closure it supplies is simply the URL that the browser used to retrieve the page that contained the relative name, truncated back to the last slash character. This closure is the name of a directory at the server that should be used to resolve the (first component of) the relative name. Some servers provide a URL namespace by simply using the local (for example, UNIX) file system namespace. When the local file system namespace allows synonyms (symbolic links and NFS mounts are two examples) for directory names, the mapping of local file system namespace to URL namespace is not unique: There can thus be several different URL's with different path names for the same object. Trouble can arise when the object that has multiple URL's is a directory whose name is used as a closure. Example: suppose that file B.html contains the web link {A HREF = "C.html"}. Both B.html and C.html are stored in the directory /real/. Suppose further that the browser obtained B.html by requesting the URL http://somewhere.mit.edu/pages/B.html where "pages" is a directory that doesn't actually contain B.html but instead has a UNIX soft link named B.html that points to to /real/B.html. Since the file B.html does not arrive at the browser with an accompanying closure, the browser provides an implicit closure by truncating the original URL to obtain: http://somewhere.mit.edu/pages/ and then uses this as a context to retrieve the web link by concatenation: http://somewhere.mit.edu/pages/C.html This URL will probably produce a "not found" response. (Or worse, return a different file that happens to be named C.html. The confusion may be compounded if the different file with the same name turns out to be an out-of-date copy of the current C.html.) Another problem can arise when interpreting the relative name "..". This name is, conventionally, the name for the parent directory of the current directory. UNIX provides a semantic interpretation: look up the name ".." in the current directory, where it evaluates (in inode namespace) to the parent directory. The Web, in contrast, specifies that ".." is not a name to be looked up in some context, but rather a syntactic signal to modify the implicit closure by discarding the least significant component of the directory name. Despite these drastically different interpretations of "..", the result is usually the same, because the parent of an object is usually the thing named by the next-earlier component of that object's path name. The exception (and the problem) arises when the syntactic modification is applied to a URL that contains a synonym for a directory name. If the path name of the synonym does not come through the directory's parent, syntactic interpretation provides an implicit closure different from the one that would be supplied by semantic interpretation. The problem can be fixed in at least three fundamentally different ways: 1. Arrange things so that the current implicit closure always works. - forbid use of UNIX links, or require use of complete link farms. - forbid use of ".." in web links. 2. Do a better of job of choosing an implicit closure. - client sends original URL plus link to the server and lets it figure out how to deal. 3. Provide an explicit closure. - Server fills in "location" field of header with an absolute URL. - Client uses that URL as the closure. A misleading characterization of the problem: One might suggest that the implementor of the server (or the writer of the pages containing the relative links) failed to take heed of the warning in the Web URL specifications for path names that "The similarity to unix and other disk operating system filename conventions should be taken as purely coincidental, and should not be taken to indicate that URIs should be interpreted as file names." (Tim Berners-Lee, Universal Resource Identifiers: Recommendations.) That suggestion, however, is misleading. Unfortunately, the problem is built in to the Web naming specifications. Those specifications require that relative names be interpreted syntactically, yet they do not require that every object have a unique URL. Unambiguous syntactic interpretation of relative names requires that the closure consist of a unique path name. Since the browser derives the closure from the path name of the object that contained the relative name, and that object's path name does not have to be unique, it follows that syntactic interpretation of relative names will intrinsically be ambiguous. When servers try to map URL path names to UNIX path names, which are not unique, they are better characterized as exposing, rather than causing, the problem. That analysis suggests that one way to conquer the problem is to change the way in which the browser acquires the closure. If the browser could somehow obtain a canonical path name for the closure, the same canonical path name that the UNIX system uses to reach the directory from the root, the problem would vanish. If this description seems mysterious, check the design project handouts and the encounter page that demonstrates the problem. The second handout also has four pages of solutions collected from some 50 student design projects, some quite novel.