Computer Systems Design Project:

Updating Web Sites

on

Read Only Media








Lukasz A. Weber

March 20, 1997

Instructor J. H. Saltzer

T.A. Costa Sapuntzakis


Updating Web Sites on Read Only Media
  1. Introduction
  2. Problem
    1. Design Criteria
    2. Background
  3. Solution
    1. Directory Structure
    2. Web Site Access
  4. Discussion
    1. Simplicity
    2. Taking Snapshots
    3. Add, Change, Delete
    4. Updates
    5. Organization
    6. Maintenance
    7. Gifts
    8. Other Issues
  5. Conclusion


Abstract

The following paper describes an approach to storing and maintaining a multimedia web site on read only media. The symbolic link directory provides for transparent edits to the HTML documents and their resources while preserving the web site's other references. Various design criteria are considered and compared with other possible solutions. For example, the duplicate directory system allows for snapshots to be taken of the web site at any time, for edits to the data to be made transparently and for updates of most recent information to be distributed to a central server or other computer systems. The system implements the solution quickly and requires only simple maintenance. This enables large amounts of data to be portable and yet modifiable, to allow maintenance of a web site on a CD-ROM.

ACKNOWLEDGEMENTS: I would like to recognize Peter Finch and Costa Sapuntzakis as eager listeners and commentators of the ideas developed in this paper.


1. Introduction

As the World Wide Web gains in popularity, the medium will be utilized in new ways that may differ drastically from its traditional design specification. One particular application of the World Wide Web and its hypertext markup language, or HTML, is in creating reports that include multimedia, such as pictures, movies and sounds. For example, a trip report from Egypt might include pictures of the pyramids, or of traditional folk songs, which then could be incorporated into a document describing where these were recorded and what each one means. However, the addition of multimedia to a document can increase its size by a factor of two to ten, or one hundred even, making moving and copying of such documents difficult. To deal with storage of all this additional data, a recordable CD-ROM can provide a large storage space. Additionally, laptops and many other computers come with a built-in CD-ROM player and CD-ROM writers are readily becoming available. This allows the documents to be used on any computer, including those in the field or those used while traveling.

The following paper will discuss a solution to managing documents with plain text and multimedia that involves both the hypertext markup language and the CD-ROM technology. This system will enable the author to travel with the CD-ROM and make changes to the information contained therein. This information can later be uploaded to a server or other computers, those of the author's friends', for example. Issues concerning installation and maintenance will also be discussed including suggestions for future use and portability over the differing computer systems.

2. Problem

The problem that this paper aims to solve is one of a traveling scientist collecting information about Egypt and storing it on his (or her) laptop computer. Although, the exact subject of the research is arbitrary, the particular details involved in managing a web site of documents on a CD-ROM are actually quite complex. At this time the web site is assumed to contain about 1000 web pages and measure 400 megabytes in size. The documents within are highly interconnected and cross-reference other pages. However, the web site is self contained and thus doesn't rely on a network for additional data. The author has decided to place his work on a CD-ROM. The CD-ROM appears to be a good choice since the discs are, at about $10 per disc, relatively inexpensive for 600 megabyte storage medium. They are also recordable once and quite durable. So he will only write the CD-ROM at home and then take the most up-to-date copies with him when he travels. The problem arises when he tries to edit his work, since the CD-ROM is a write-once and then read-only medium. The system to be proposed here will allow the author to add, change or delete any portions of his present work and enable the work to be saved and, eventually, a new CD-ROM to be written with the updated information.

2.1. Design Criteria

The solution to the problem above aims to cover certain criteria deemed important and useful to the users of the system. Those criteria have been divided into two main groups, one dealing with updates of the system on any computer, and the other is concerned with maintenance of the system on a laptop or a server.

One of the most important aspects of a system that will allow web authoring from a CD-ROM, is the system's ability to manage changes. The best solution would require little or no involvement from the user and provide for easy manipulation of the web site structure. Additionally, a snapshot, or a copy, of all the information contained on the web site should be equally easy to make. This snapshot will be written on the CD-ROM and carried with the laptop. Any new additions to the system must be saved elsewhere and the information must be later transferable to a server. The same information should be available for other computers, since the author is planning to make gifts of the latest versions of his work. Finally, in case the information on the gift CD-ROM becomes outdated, the updated information should be made available through the network for all older versions of the web site.

The new system should be easily installed and maintained by the author on both the laptop and the server. If possible, no new software should be used and the organization on the laptop and server should be kept similar so that administration of one does not differ much from the other.

2.2. Background

The design of the system relies on a few key assumptions about the computer system that the scientist uses and, specifically the file system he stores his work on. First assumption is that the laptop, and any servers, are using UNIX or another file system that allows symbolic links to be used to alias path names. The laptop also must have a hard drive on which any changes can be saved, and a CD-ROM player. Finally, the author is assumed to be using HTML, the hypertext markup language, and a standard browser that uses the universal resource locator, or URL, naming scheme. With these assumptions and the desirable properties of the solution listed above, the description of the solution follows.

Figure 1: Symbolic Link

3. Solution

The solution to managing a changing web site from a CD-ROM requires a duplicate directory tree that resides on the hard drive. The old documents on the CD-ROM will all refer to this structure and all documents will have a "symbolic link," (See Figure 1) referring back to themselves in this structure. This solution takes advantage of the difference between the UNIX file system and the URL naming system. In particular, the ability to establish a symbolic link in UNIX between a path name to a file and a different location in the directory structure, allows for data to be presented by UNIX to the browser from different directories than they have been requested from.

3.1. Directory Structure

For this system to work, the duplicate directory structure must exactly mirror the organization of the CD-ROM. Then, when a browser requests a document, the UNIX file system searches for it in the duplicate structure and will find either a link or a document. The link will refer to an old document on the CD-ROM, while the document will be a new or a modified document. The browser will receive a document regardless of whether it is located on the CD-ROM or on the hard drive. Furthermore, the fact that the UNIX file system followed a symbolic link will be completely transparent to the browser, so that the browser's naming context will remain as the path to the link.

3.2. Web Site Access

To access the web site, the CD-ROM must first be mounted, or made accessible to the UNIX file system. Then, on the hard drive, at the top of the web site's directory tree, a symbolic link should be made to the CD-ROM mount point. The link should be named so that the symbolic links in the duplicate directory tree are valid. This is illustrated in Figure 2. At this point any web page can be loaded by the browser from the hard drive. If a document or a link is accessed on the hard drive, then either the new hard drive document or a CD-ROM document will be served to the browser. Since the browser has loaded a symbolic link, all the links from the actual document will lead to the duplicate directory tree and therefore to either symbolic links or to new pages located on the hard drive. A restriction on web page access exists, however. All documents for the web site must always be accessed through the hard disk, to set up the correct URL context. If a page from the CD-ROM is loaded, only CD-ROM pages will be accessed. Additionally, all new and old documents must contain relative links to each other, so that the tops to the directory trees can be changed with different installations.

Figure 2: Paths accessed by UNIX and a browser.

The organization of the server web site can either mirror the CD-ROM and hard drive setup of the laptop, or can simply contain the CD-ROM directory structure with the new files. This way any document will reference directly to the current documents. However, the presence of the duplicate directory structure facilitates managing of updates for older users of the web site. The implementation issues and decisions made will be discussed further below.

4. Discussion

The duplicate directory system for a CD-ROM based web site is simple to implement and to manage. It allows for snapshots to be taken and newer releases of the CD-ROM to be made by typing a single UNIX command. Using the standard commands, the system allows changes, additions and deletions from the web site while it is on a laptop, and provides for easy synchronization with the server. The organization of both the laptop and the server copies of the web site are unconstrained except for the consistent naming at the top of the directory tree and between the CD-ROM and the hard drive directory structures. (See Figure 2) The maintenance of the web site is simple and does not utilize custom software or hardware. When CD-ROM's are given out to other users, the newest information can either be gathered from the laptop or from the server. Finally, the system is expandable to include more then one CD-ROM or alternate media that may become available in the future. The system is also not limited to UNIX, although implementations for other operating systems require further restrictions on directory locations.

4.1. Simplicity

The use of symbolic links makes redirection of file access very simple. This scheme doesn't require any special software, but uses standard UNIX commands. For example, another solution to this problem might involve writing a local web server, which will be customized to serve only the most recent files, whether these would be on the CD-ROM or on the hard drive. Although, the idea of a server nicely abstracts away the complexity of checking for newer documents and overall managing the whole web site, the use of symbolic links reduces programming to a few lines of UNIX commands. Therefore the complexity of the implementation is drastically reduced by the use of symbolic links making the solution available now and installable in only a few minutes.

4.2. Taking Snapshots

The writing of CD-ROM, or taking snapshots, can be done quickly and easily by using the 'cp' command in UNIX. When the command line option '-r' is used, the command will recursively traverse the directory structure and reference all symbolic links replacing the links with genuine files. In this fashion, all the updated documents and other resources along with the unchanged data from previous CD-ROM will be copied into the new CD-ROM. The new CD-ROM will contain the updated version of the web site, and can be added to or changed as before.

4.3. Add, Change, Delete

When the first CD-ROM is written, the duplicate directory structure can be created by the command:

cp -r --symbolic-link [CD-ROM path:] [destination path:].

This command will recursively descend down the directory tree starting at 'CD-ROM path' and create symbolic links to every file and directory in the 'destination path.' This establishes the duplicate directory structure and any file or resource on the CD-ROM can now be accessed through these links. All new documents will be put into this duplicate directory structure. Any changed document's symbolic link should be removed and replaced with a new or modified document. The new documents must use relative path names for their links and not include the "BASE" HTML tag, since the exact placement of the main web site directory can vary between installations.

4.4. Updates

Updates of the home server can be done either by FTP or by any other file exchange protocol. Either the entire web site can be updated by copying all the files over or both the symbolic links and the changed files can be copied over. In the latter case, the data should be copied to a new directory without referencing the symbolic links, or by using the '-d' option with the 'cp' command. A regular copy of these files will now produce errors, since the symbolic link references are no longer valid, but they will not overwrite the links already present on the server. The links not referenced will only be pointing to old documents on the CD-ROM, for which links are already present, and only the new files will be copied over to the server. (See Table I) Therefore, the first approach can be used to write CD-ROM's or to update the web site if the site is held on a server's hard drive. The latter approach can serve to update a system that uses the same CD-ROM for reference. This technique also allows for incremental updates to be made for users using older editions of the CD-ROM.

Full Server* Update Incremental Server* Update
cp -r [laptop's new file path] [server path] cp -d -r [laptop's new file path] [temporary]

cp -r [temporary] [server's new file path]

*Server here can be another computer as well.

Table I: Updating the server.

4.5. Organization

The organization of the server, or any other computer, and the laptop can be different, as stated before. However, since symbolic links are not absolute, the laptop directory tree must match the server directory tree only within the web site. This is achieved when the directory is copied symbolically, or with the '-symbolic-link' option. Otherwise the organization of the laptop and the server can differ.

4.6. Maintenance

The web site's maintenance will encompass everything from taking snapshots to updating the server or the laptop. All of the steps involved in maintenance use standard UNIX commands, which can be turned into short shell scripts for convenience. The duplicate directory doesn't need any setup files, as might a web server, nor as much room as an exact copy of the web site.

4.7. Gifts

The author of the web site is likely to hand out extra copies of the CD-ROM as gifts. The CD-ROM's can be used alone or, it the web site has been updated since the last time the CD-ROM was written, the hard drive can be used to carry the duplicate directory structure already present on the laptop. Future updates can be obtained from the server, which should maintain incremental updates between CD-ROM writes. Thus, if the current version of the web site is 5.0, a user with version 3.0 should first upgrade to version 4.0 and then to the current version. These version upgrades can be generated by retaining the hard drive directory structure and all the resources before a new CD-ROM is written and the old directory structure replaced. The updates can be compressed and stored for access by users of older versions of the web site.

4.8. Other Issues

This scheme works very well with the UNIX file system, which is extremely powerful and its symbolic link feature makes it critical to this problem's solution. If the system will continue to be managed by the author and perhaps another person, the system proposed here can be quickly implemented and easily maintained between updates. However, the UNIX system is not as popular with the growing number of home internet users, who connect to web sites with personal computers using operating systems that do not support symbolic links. This will become an issue if the publishing of the web site is to become public and the CD-ROM's become either shareware or available commercially. At this time, such users will have to be confined to network use only of the web site, which limits the flexibility of this solution. To facilitate this type of use, a custom web server application that would run alongside the browser on the local computer is a better solution.

5. Conclusion

One of the most important aspects of web authoring is the ability to change the content of the site and the resources it uses. A CD-ROM provides a large storage space which is well suited to storing various types of information that are rather large, like pictures, movies, or sound. A successful combination of these two technologies results in a flexible system providing a large, durable, and portable web site that allows changes. Such a system can be implemented by the use of symbolic links supported by the UNIX file system. Creating a duplicate directory structure from such symbolic links requires little or no involvement from the user and provides for easy manipulation of the web site structure. Furthermore, snapshots of the most recent state of the web site can be easily made to a CD-ROM. This CD-ROM can be carried with the laptop and any new additions to the system will be saved on the local hard drive. The information can be transferred to a server or other computers, in case the author decides to make gifts of the latest versions of his work. Finally, in case the information on the CD-ROM's becomes grossly outdated, the updated information can be made available through the network not only for the latest CD-ROM made, but also for any older CD-ROM version of the web site. The symbolic link directory structure is easy to install and to maintain, making this solution quicker to develop and deploy then other ideas, involving custom web servers, for example.


Bibliography

Saltzer, J. H., Name Binding in Computer Systems. MIT EECS Dept.: Cambridge, MA, 1978.

Tannenbaum, Andrew S., Modern Operating Systems. Prentice Hall: Upper Saddle River, NJ, 1992.