Lab Assignment 1: A Web Server

Assigned: February 6
Due date: February 18

Introduction

The first programming project in 6.033 is meant to prepare you for the next two projects by acquainting you with the tools of UNIX programming and especially network programming. Unlike the other labs, we will start you off with some code for setting up network connections, since the code for doing this is very mechanical. From this point, you will build a fully functional web server and client.

Running a simple client/server program

You will be modifying a trivial client/server program. The C sources for the program can be found in the 6033 locker in the subdirectory lab/src/un.

The locker contains 3 files: client.c, server.c, and Makefile. Copy the files to you own locker and type make. Make will compile client.c, server.c, and create the binaries client and server (type add cygnus to add gcc to your PATH). Start the server by just typing server. In another window, type client name, where name is the name of the machine on which the server is running (i.e., in this the case the machine on which you created the window with the server. (You can find out the name of a machine on which you are logged into by typing hostname.) Every time you run the client, the server will print the notorious C welcome message.

Understanding the client and the server

Once you get the programs running, you should study the code and try to comprehend what is going on. You might find it helpful to read chapter 10 (``Communication in distributed systems'') of the Tanenbaum book---one of the 6.033 text books---to obtain a general understanding of client/server programming. The specifics of each UNIX networking call can be looked up in the man pages (e.g., type man socket). If you cannot make sense out of the program using the sources, Tanenbaum, and the man pages, you might want to borrow ``UNIX network programming,'' written by Stevens; or, get in touch with your TA, Costa Sapuntzakis.

Modifying the server

The programs client and server give you a skeleton to write any client/server program. As the last part of this exercise, you will write a web server. This section details the interface of the server program. The next section will introduce the HTTP web protocol used to server the files.

The web server takes one argument, which is the root of the web page hierarchy. For example, if I want to serve up my web pages, I would invoke the server as follows: server /afs/athena/user/c/s/csapuntz/www/. The server will use the HTTP protocol so that it is compatible with the wide variety of web browsers available.

The HTTP Protocol

The Hypertext Transport Protocol (HTTP) is the most commonly used protocol on the Web today. For this lab, you will be implementing the simplest version of this protocol, called HTTP version 0.9 or HTTP/0.9.

The HTTP protocol assumes a reliable connection, and in current practice, the TCP protocol is used to provide this reliable connection. The TCP protocol provides the reliable transport of bytes between programs on two seperate machines, even over an unreliable network. Lucky for us, the TCP protocol is built into the operating system.

To support multiple services on a single machine, TCP has an abstraction known as a port. For example, if you connect to port 13 on many UNIX computers, the computer will return the time of day (try it! telnet athena.mit.edu 13). On most servers, the HTTP protocol lives on port 80. However, it turns out that port 80 is protected on most UNIX systems, so we'll have to run our web server on a higher port (like 8080). To use the server, we need to modify our URLs a bit, adding the port number after the machine name. For example, entering http://playdoh.mit.edu:8080/origami.html into your favorite web browser tells it to connect to the machine playdoh.mit.edu on port 8080 using the HTTP protocol.

The HTTP protocol is a request/response protocol. When a client opens a connection, it immediately sends its request for a file. The web server then responds with the file or an error message.

The format of the request for HTTP is quite simple. A request consists of verb followed by arguments, each seperated by a space, and terminated by a carriage return/linefeed pair. The server should be tolerant of implementations that only send linefeeds.

HTTP/0.9 supports only one verb: ``GET''. This verb takes one argument, the file to be retrieved. All other arguments are to be ignored if they are not understood.

Once the request line is received, the HTTP/0.9 server should continue reading the input from the client until a blank line is found. It should then send back its response (usually the file contents) and close the connection.

The path seen by server in the GET request is To form the path to the file to be retrieved on the server, the client takes everything after the machine name and port number. For example, http://playdoh.mit.edu:8080/home/ origami.html means we should ask for /home/origami.html. If you see a URL with nothing after the machine name and port, then / is assumed.

You can try out the protocol yourself. For example, try

	% telnet web.mit.edu 80

	GET /6.033/www/home.html

and see what you get.

Web Server Details

When a client attempts to fetch a directory (e.g. / or /6.033/www/) from the server, your web server should fetch the index document, index.html, located in that directory. If the document is not found, your server should return an error.

For HTTP/0.9, there is no explicit way of signalling errors. Rather, you should send back a small web page spelling out what went wrong.

The easiest way of testing your web server is to run Netscape or NCSA Mosaic against it. We found that Mosaic handled the older HTTP/0.9 protocol much better than Netscape did.

What to hand in

When you're ready to hand in the lab, just send the TA the Athena directory in which it is stored. The TA will copy over the directory and type make to build your client and server. If you have made any improvements to the client or server specifications, please include that in a README file. Also, make sure that the AFS permissions are correctly set on that directory to allow the TA read access.

Extra Fun

Here are some extra projects, in case you get done early and want to keep hacking:

References

Berners-Lee, T., and others. RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0. http://ds.internic.net/rfc/rfc1945.txt

Berners-Lee, T., and others. RFC 2068: Hypertext Transfer Protocol -- HTTP/1.1. http://ds.internic.net/rfc/rfc2068.txt

Stevens, W. Richard. Advanced Programming in the UNIX Environment. Addison-Wesley, 1992.

Stevens, W. Richard. Unix Network Programming. Addison-Wesley, 1990.



Constantine Sapuntzakis Thu Feb 6 19:22:57 EST 1997