THE PORTABLE WEBSITE

By Danai Kuangparichat

ABSTRACT

This paper describes an implementation of a portable web site: a site= carried around on and updated through a laptop computer. The site is supposed to= mirror a site on the World Wide Web; when the laptop is connected to the Internet, the= public site is to be updated to match the site currently on the laptop. Mirror sites on= other computers need to be allowed to be created, too. Before embarking on a trip, the= updater of the site takes a snapshot of the site onto a CD-ROM. The approach taken in this= paper is to set up a directory of symbolic links on the laptop=92s hard drive, and to= copy over those links when the user wishes to modify a file on the CD. This approach keeps= track of changes to the site via update files, which are kept both on the laptop and= at the public site.


INTRODUCTION

With today=92s explosive growth of the Internet, web servers are popping up= everywhere. Likewise, more and more people are learning how to get around the Net every= day. Although the Internet has spread throughout all of America, as well as many other= countries, there still exist many places in the world that have very limited access to the= Internet, and some that have no access at all. When one travels to an area with limited= Internet access, he may desire to update a remote web site frequently; however, due= to the unavailability of the Internet, this updating is not always possible.

The solution is to design a portable web site, a system to handle the= organization of the web site changes between Internet sessions so that during these sessions,= the remote web site can be updated quickly and easily. This system can be stored on a= laptop which is carried around by the web site=92s owner. Whenever the owner desires to= make a change to the site, he changes it locally; then the next time he connects to the= Internet, he uploads all of the changes he has made to the remote web site.

Of course, the owner is not the only person who will want to access his web= site. Although the remote web site can be made publicly accessible, there may be= interested users who only have limited access to the Internet; thus, they may also want= to keep local copies of the web site. Therefore, the system also needs to allow= easy copying of the site from the laptop and easy downloading of updates on the web site.

Several assumptions are to be made when designing this system. The web site= on the Internet is stored on a UNIX file system. The laptop onto which the site is= copied also uses a UNIX file system and includes a CD-ROM drive, a floppy disk drive,= and a hard drive. Before each trip to an Internet-deprived area, the owner of the web= site can copy its current content (and maybe other files) onto any number of CD-ROMs;= however, the CD-ROM can not be rewritten during the trip. Therefore, local updates can= only be made to the laptop=92s hard drive (and any floppy disks the owner may want to= carry). The other computers onto which the web site is copied and downloaded, however, may use= any modern file system, and might not even include a CD-ROM drive. The pages of the= site may contain link to other pages in the same site, but none to other sites. It is also= assumed that the person who keeps the local web site on the laptop is the only person who= has write access to the remote web site.

To prevent confusion, I will define here the local web site as the= site stored on the owner=92s laptop; the clone sites as the copies of the site on= other user=92s computers; and the home site as the site on the World Wide Web. = Owner refers to the person who manages the home site and carries the local= site.

In the system presented here, the organization of files on the local site= and clone sites uses symbolic links to the CD-ROM extensively in order to save hard drive= space. However, the home site (and any clone site on a system that does not allow links or= does not have a CD-ROM drive) does not use links at all because all of the pages are stored= on the same device; so saving space is not an issue. This system also uses several= files to keep track of which pages have been modified, and when they were modified. These= files are used to facilitate uploading, downloading, and copying of the site so that= copying every file is not always necessary.


DESIGN CRITERIA

The portable web site must support the following features:

  • Local site updating. Any time the owner desires to modify the site in= any way, he can do so immediately on his laptop. The local site should contain the most= up-to-date version at all times when the owner is travelling.

  • Home site updating. When the owner connects to the Internet, he can= update the home site to match the current local site.

  • Clone site creation from local site. The owner can create a new clone= site on another computer from his laptop. This new site would match the current local site.= The other computer might or might not use a CD-ROM, and it may be using any modern= file system.

    Other features that should be added if possible include the following:

  • Clone site creation from home site or another clone site. Any user= should be able to create a new clone site that matches the site from which it was created.

  • Clone site updating. Naturally, one way to update a clone site would be= to erase it, and then copy or download the newest version to recreate the clone. But= having a system that can update clone sites quickly, either from the home site or the local= site, is much more desirable.

    There are a number of ways to implement these features. However, some ways= are more desirable than others. Here are the criteria to be used to determine the= usefulness of the system:

  • Ease of use. All the operations -- downloading, uploading, copying --= should not only be easy to perform, but also easy to explain. The fewer steps an operation= requires, the less room there is for user error. Additionally, the owner will need to= explain to users of clone sites how to update their clones; if the process is too= complicated, they may misunderstand or forget the owner=92s explanation.

  • Simplicity in design -- Complex systems leave lots of room for error. = They are also difficult to implement and debug.

  • Efficient use of disk space -- Portable computers can only carry so= much.. Thus, we need to manage the disk space well at the local site, but still include= files necessary to allow easy updating and creation of web sites. Because we do not know how= powerful the computers are on which clone sites will be created, we should prepare for= the worst and assume that their disk spaces will be limited also. Although the home site= should also provide efficient disk space management, it is not as important as for the= other two types of sites. If the home site runs out of disk space, the owner can simply buy= a new hard drive. New hard drives may not be as practical on laptops and the computers= carrying the clone sites, though.

  • Speed. Most people desire not to wait a long time when downloading,= uploading, or copying files. Naturally, if the web site is large, such waiting may be= unavoidable, but the system should minimize the time it takes to update and create web= sites.

  • Flexibility for the owner. The system should place as few constraints= as possible on the updating of the local web site. Ideally, the owner should be able to= modify the local site to the same extent that he can modify the home site. A simple example= would be adding a new page: Naturally, the local site is virtually required to allow= such an operation. A more complicated example would be reorganizing directories: = Moving files between directories, creating new directories, or removing directories may= not be easy to update on the home site or clone sites; thus, some designs of the system may= restrict such actions.

  • Flexibility in creating clones. The system should be able to be cloned= on a wide variety of computers. The more people there are that can clone the site,= the more useful the web site will be.

  • Reliability. The system should not leave a lot of room for failure. = The chances of failure are less likely if the processes are made simpler, and if the user= is not given a lot of restrictions. The system should also do redundant checks to see if= everything is operating as expected.


    DESIGN DESCRIPTION

    FILE ORGANIZATION

    This system requires that web pages use relative urls for links to other= pages. Because the pages are going to be copied to other sites, absolute urls would be= illogical: Once the page is copied over, an absolute url would become invalid. Although= this requirement decreases the flexibility on what the owner can do, it is much easier than= allowing absolute urls, and then changing all those urls in each page copied to= another machine.

    Before the owner embarks on a travel, the CD-ROM should contain all of the= web site files as they are organized on the home site; in other words, all files should be= contained in the same directories as the ones at the home site. The directory structure= should also be created on the laptop=92s hard drive. Within these directories, symbolic= links should be created for each file in the web site (but not for the directories). These= links should have the same names as the files with which they are linked and should be in= the same directories (but on the hard drive instead of the CD-ROM). Whenever a= browser is opened on the laptop, it should only read directly from the hard drive. Although= the recreation of the directory structure on the hard drive takes extra work, it makes= updating the site much easier. Because the CD-ROM is read-only, reading directly from the= CD-ROM would be rather impractical; any time a page is changed, there would be no way to let= the browser know not to read from the CD-ROM for that page.

    If the owner decides to modify a page, he needs to erase the symbolic link= to that page and replace it with a copy of the unmodified page. He can then modify this= hard drive copy. All future modifications to this page should also be made in the same= hard drive copy. With this system, only modified pages are stored on the hard drive,= thus minimizing the hard drive space used by the web pages.

    With so many symbolic links and copies of pages on the hard drive, one may= think that there is a possibility of reading the incorrect version of the page. (in= other words, reading from the CD-ROM when the browser should be reading from the hard= drive) However, because the pages are required to use relative urls in all of their web page= links, the working directory of the browser will always be in the hard drive. So the= only time the browser would read from the CD would be when the file on the hard drive is a= link to a file on the CD. And the only time this file would be a link is when the= owner has not modified it from its original state (on the CD). Therefore, reading the= incorrect version is actually not an issue.

    On computers that allow symbolic links and have a CD-ROM drive, clone sites= should have the files organized in the same way as the local site. A copy of the CD can= be given to the users of these clone sites. If a computer does not have a CD-ROM drive,= the only feasible way to organize the files of the clone is to copy all of the files= into a directory structure identical to that of the local site, and instead of the= links, use the actual files from the CD. This copying process is elaborated in the= Creating Clones section.

    On computers that contain a CD-ROM drive, but use a file system that doesn= =92t allow links, the problem gets a little more complicated. Naturally, we could copy the= whole site onto the hard drive and not use a CD-ROM. But a lot of space could be saved on= the hard drive by using artificial links for HTML files: If an HTML page with one frame= is created on the hard drive, and a page on the CD-ROM is embedded in this frame, the file= on the hard drive will appear just like the page on the CD-ROM when viewed through a= browser. So essentially, this file (which will be hereafter referred to as an= artificial linking file) on hard drive is an artificial link to the CD-ROM file. (For an= explanation of frames, please refer to an HTML handbook.) Naturally, this technique only= works for HTML files; so other files, such as images, will need to be stored directly to= the hard drive.

    An alternative to creating these artificial links when creating the clone= site would be to have the local site use these artificial links instead of the symbolic= links. Thus, creating new clone sites would be much easier. However, the artificial= linking file contains an absolute url that may not be valid for other machines.= (specifically, if the referring name to the CD-ROM drive is different than in the local machine.) = Thus, when the clone site is being created, the artificial linking files would have to= be recreated anyway. Because symbolic links provided by the file system are much easier= to handle than these artificial linking files, the local system should use the symbolic= links.

    At the home site, there should be no linking in any way. Because everything= will be stored on a hard drive, the site only needs one name for each particular= file.


    LOCAL SITE UPDATING

    All updates of the local site are made within the hard drive. In order to= facilitate updating the home site, an update file is kept to record all of the changes= in the site since the last update of the home site. Each time the home site is updated,= a new update file is started; all changes in the local site between this home site update= and the next one will be recorded in this new update file. The content of the update= file is organized into a list of entries, with each entry occupying one line of the file. = Each entry contains two or more fields: an update type, followed by one or more file= names or directory names. The update type indicates what was done to the given file= or directory name(s). The following table lists the update types and the files or= directories they take as arguments:

    Table 1: Entry types within update= files

    Update type Argument 1 Argument 2
    create_file file that has been created or modified (None)
    delete_file file that has been deleted (None)
    move_file file(s) that have been deleted directory to which they were moved
    copy_file file(s) that have been copied directory to which they were copied
    create_link link that has been created or modified file to which it is linked
    create_directory directory that has been created (None)
    remove_directory directory that has been removed (None)

    The arguments should always be relative to the base directory of the web= site. Any changes made in update files or in other files that are not actually part of the web= site should not be recorded in the update files. When an argument is supposed to= represent multiple files (for move_file and copy_file), the UNIX convention for wildcards is= used. Also, the update types are not limited to the seven presented here; if other update= types are thought of, they may be added to this list. Each field of the entry should= be separated by a tab character.

    There are two ways to create the update file: One way is to change the= update file each time the owner makes a change to the contents of the site; the other is to= create the update file at the time of the updating of the home site.

    First, let=92s examine the case where the update file is changed= incrementally (i.e., each time the owner makes a change). As an example, let=92s say the local site= has all of its files stored in the directory /www and its subdirectories. (In other= words, /www is the base directory.) Then, let=92s say, the owner modifies a= file in /www called pyramids.html. So in the update file, the line= "modify_file [tab] pyramids.html [newline]" would be added. Now let=92s say the owner moved= all of the .jpg files from /www/Cairo to /www/images. The= following line should then be appended to the update file: "move_file [tab] Cairo/*.jpg [tab]= images [newline]" Note that it is very important that the entries of the update file are= listed in the same order that the modifications were made.

    When modifying the update file incrementally, relying on the owner to update= the update file could cause many problems. First of all, he may forget to do it= sometimes; also, he may type something incorrectly and not notice. Therefore, having the= machine make the updates might be more reliable. This could be achieved by having the= machine automatically call a routine to write to the update file every time the= relevant commands are called. (e.g., mv, cp, rm, the UNIX commands for= moving, copying, and deleting files, respectively.) Of course, if certain= applications on the laptop allowed files or directories to be moved, copied or deleted without= using the UNIX prompt, we would need to insure that the update files are updated in those= cases, too. Additionally, detecting modifications of files would be difficult using= this method; perhaps the machine could update the update file any time a file is saved. = With all of these complications, implementing routines to have the update files updated= by the machine is very difficult, if not impossible; so updating incrementally might not be= the best solution.

    For a system that creates the update file only upon updating the home site,= a program could be written that does a long-listing recursive ls (using the= -R and -l switches) to list all the files on the hard drive that are part of= the web site, and then a long-listing recursive ls on the CD-ROM, and writes the= results of these ls=92s to files. Then the program could compare the two files to= check for modifications to the local site. Upon subsequent updates to the home site,= the program could create a ls output file from the current content of the hard drive and= compare it to the most recent listing file. Naturally, this method can not detect when a= file has been moved or copied, but instead sees only that a file is no longer in a given= directory or that a file has been created in a given directory. Because the end result= is the same, we do not need to worry about this issue.

    Deciding between incrementally modifying the update file (the incremental= method) and creating the update file all at once (the aggregate method) is an= issue that has yet to be resolved. The incremental method would be much less prone to= bugs because each change to the update file is small; however, the feasibility of its= implementation is questionable. On the other hand, the aggregate method is probably much more= feasible to implement, but due to its complexity, is also a lot more prone to bugs.

    Regardless of which method is used to create the update file, the user needs= a way to modify files. Because the file=92s original copy is on the CD-ROM, it can= not be directly changed. Therefore, it needs to be copied onto the hard disk before it can= be changed. One way to do this is to copy the file over its equivalent link on the hard= disk, and then allow the user to modify this copy. Another way would be to copy the file= into an update directory and change the appropriate link. Although the second approach= keeps the link structure from being disturbed, it has problems when two files of the same= name are updated. The easiest way to fix this problem would be to create a whole= directory tree the mimics the directory tree of the web site. Of course, doing this may be= a waste of space and effort; so the first approach seems more logical.


    HOME SITE UPDATING

    Updating home sites depend heavily on the update files created. When a home= site is being updated, the local site first sends the update file. It then starts reading= the update file until it comes to a create_file entry. At this point, it sends the= appropriate file, then waits for a reply. If no reply comes after a while, it sends the file= again. Once it gets a reply, it finds the next create_file entry in the update file and= sends the appropriate file. It then waits for a reply again. This process continues= until the update file is entirely read.

    Upon receiving the update file, the home site puts it in the same directory= as the other update files and starts reading it, line by line. It executes the= appropriate command for each entry in it until it comes to a create_file entry. At this point, it= checks whether the a file has come in from the local site. If not, it waits. When the= file comes in, it checks that it is the correct file. If it is, it sends a reply. If not, it= assumes that the home site did not get the reply for the last file; so it sends the reply= again, then waits again. Once the home site has the correct file, it places it in the= correct directory. It then continues to execute the commands in the update file= until it encounters another create_file. This process continues until the update= file is read entirely.

    An alternative to sending each file one by one would be to send everything= at once. This approach would work fine under normal circumstances. However, if some files= get lost in the middle of an update, the home site would need a way to tell the local= site to send those files again. Also, if anything else goes wrong during the update, and= the local site never gets notified that the server has completed the update, it will= be difficult to figure out exactly where the updating completed. With this method, the= local site can pick out exactly at which file the problem occurred.


    CLONE SITE CREATION

    Clone site creation is very complicated because it is desired to be able to= create them on any computer. Naturally, there are some computers on which there is no way= to create an efficient clone site. As an extreme example, computers with only floppy= disk drives (no hard disks or CD-ROMs) would have a lot of trouble supporting a clone site. = Although the design presented here does not formulate a scheme for such a primitive= computer, it does support clone sites on a moderately wide range of computers.

    The simplest clone creation would be on a computer that has a hard drive, a= CD-ROM drive, and a UNIX file system that allows links. In this case, creating a clone= from the local site would simply involve copying the appropriate directory tree from the= hard drive of the laptop with the local site to the other computer, and issuing a copy of= the CD-ROM to the user of the clone site. If the name of the CD-ROM drive is not the same= on the computer with the new clone site, the links need to be changed, too.

    For a computer with a hard drive and CD-ROM drive, but a file system that= does not support the symbolic links created by UNIX, artificial links can be used, as= explained previously in the File Organization section. To create the clone site, the= appropriate hard drive directory tree could be first copied to the computer of the new clone= site. Then a program could be run to replace every symbolic link with an artificial link.= Because the symbolic link will probably not be able to be interpreted on the computer= with the clone site, a file first needs to be generated on the laptop with the local site;= this file should list all the symbolic links in the web site. This file could be= copied over to the other computer and then interpreted by a program to create all the= artificial links. Because artificial links operate through frames, if the computer with the= clone site does not have a browser that supports frames, the laptop=92s browser should be= copied also.

    On a computer that does not use a CD-ROM drive, everything must be stored= without links. So when creating a clone on this type of computer, all of the non-link files= should be copied from the hard drive of the lap top with the local site. Instead of= copying the links, though, the corresponding files on the CD-ROM should be copied.

    The clone site creations mentioned so far have all been from the laptop. = Because CDs can not be transferred from the home site, any clone created from the home site= will not use links. Thus, the file organization of these clone sites will be the same as= the one mentioned in the previous paragraph. When a user wants to create a clone= site from the home site, he first downloads a program that will serve as a sort of starter= kit for the clone site. When he runs this program on his machine, it contacts the home= site, and the home site sends out instructions on how to recreate its directory structure= on the machine with the new clone site. Additionally, of course, the appropriate files are= transmitted. This communication uses replies just like the communication for updating the= home site. The reasons for using this method are the same as before.


    CLONE SITE UPDATING

    To update a clone site from the home site, the user of the clone site needs= to download a program. When the program is run, it figures out the date of the most= recent update file and sends this date to the home site. (Again, this communication uses= replies.) The home site then finds all of the update that are dated after the given date and= sends them to the clone site. The process that ensues is very similar to the process of= updating the home site from the local site: Both sides go through the update files, in= order. When a create_file entry is found, the clone site waits for the home site to send= it. All other creations/deletions are made independent of the home site. There are= several differences between this updating and home site updating: First of all, multiple update= files might be used. Both sides can figure out which order to execute these update= files in by using the timestamps. Secondly, there may be some cases where the home site can= not find the file in a create_file entry. In this case, a file was created during one= update to the home site, but then deleted or moved during a later update. Because in the= end, the file will soon be deleted anyway, the home site in this case sends a dummy file= -- a file with the correct name, but no contents.

    Sometimes, a user of a clone site may run into the owner again. In this= case, he may want an update from the local site. If the owner is on a different trip from the= last time he updated or created this clone site, he will have a different CD. This CD= will now contain all of the updates at the end of his last trip. Thus, all of information= recorded on the clone site=92s hard drive is now on CD; so essentially, to update the clone= site, all of the files from the local site=92s laptop=92s hard drive need to be copied over. = In other words, the same procedure will be used as is used to create a brand new clone site.= Of course, before copying anything, the portion of the clone site on hard drive needs= to be erased.


    RECOMMENDATIONS

    An alternative to the design presented here would be to create a new browser= that finds the appropriate file and displays it in the browser. Such a browser would= require some file to keep track of where to look for each file. Essentially, this file= would be a file of links. This design was avoided for several reasons. First of all,= creating a new browser is quite complex, and with complexity comes a greater possibility of= bugs. The option of using smaller programs to interpret human-readable update files is= much more attractive. Secondly, because we do not know what types of computers the= clone sites will be created on, we would need to write a different type of browser for each= environment. Although the scheme presented in this paper also uses different processes to= create clone sites in different environments, these schemes are much, much simpler and= easier to test for bugs. If multiple versions of a relatively complex program need to be= created, lots of time would be spent on debugging. Additionally, all of the problems= existent in the design presented would also be problems if a new browser were written:= keeping track of new files, deleted files, etc.; finding a way to update the directory= structure of the homesite efficiently; finding a way to efficiently update clone sites, etc. = Creating a new browser would just be adding another problem on top of this list.

    Naturally, creating a new browser does have some merits, though. With the= browser, the only way to modify the directory structure would be through the browser= editor, because if one tried to make modifications directly to the file, he could not find it= on the hard drive. He would have to rely on the browser to copy it over, update its= link so that it knows this file is now in the hard drive, and then let the user modify the= file. This characteristic provides the browser with some modularity; so the user does= not have to deal with low-level commands, and the program is easier for him to use. = Additionally, the browser saves some space on the hard drive because it does not create a= whole directory tree of symbolic links. However, these benefits are far outweighed by its= cost in complexity.

    The design presented in this paper, however, fits the design criteria quite= well: The processes have been kept as simple as possible by using small programs and= human-readable text files to manage the processes. All of the features, both primary and= secondary, are supported by this design. The design is easy to use for both the owner of= the web site and users of the clone site; uploading and downloading require no more than= two steps by the user. Hard copies of the files are only made to the hard drive when= necessary; otherwise, symbolic links are used; so the disk space is managed fairly= efficiently. The user is given a lot of freedom in what he does with his files; he can= basically manipulate links, files, and directories as he pleases. The design here also takes= into account a variey of different environments in which a user may want to create clone= site. Although it may not cover every possible environment in the world, this design= handles the most common ones. By performing checks and using a send-and-reply tactic for= transmitting files over an unreliable network, this system increases its= reliability.


    CONCLUSION

    This design is not yet complete, though. As mentioned previously, a choice= still needs to be made on the method for creating update files: incremental or aggregate. = This choice may not be clear until one starts to think more deeply about implementation.= Another item not discussed in this paper are redundant checks in the system. Currently,= there is no way to check whether a home site or clone site was successfully updated= without bringing up the site in the browser and checking every link. It may also be= desirable to perform a check to see if the update files are correct.

    There are other issues that are not quite as vital to the system. For= example, when the user creates a hard link to another file in the web site, the current scheme= copies that particular file more than once when updating home sites or clone sites: once= for the original file, and once for the hard link. This phenomenon occurs because= hard links appear just like files when ls is called. This does not sound like a big= problem, but if the file is large, it could unnecessarily take up a lot of space on a clone= site, which could be a problem if the clone site has a limited amount of space to begin= with.

    Another issue is increasing the number of computers on which clone sites can= be created. It is difficult to simply blindly predict what types of computers to expect.= We of course want to support the common case, as was explained in this paper, but what if= the owner meets a fair number of people who want to create clone sites on computers= that are not supported by the current design? We could try to think of ways to clone= sites on every single type of computer, but this would probably end up being a waste of= time. The best thing to do would be to prepare for the common case; find out what types of= computers are popular for creating clones; then formulate methods for creating clone sites= on computers that were not originally considered. An even better alternative would be to= first find out what types of computers are popular for creating the clone sites,= formulating methods to create the clone sites, and then letting the owner travel.

    Portable web sites are very versatile and useful. Perhaps in the near= future, they will be used by scientists, ambassadors, politicians, and maybe even the common= tourist, to keep the public aware of any news that has occurred during one=92s travels.