The solution is to design a portable web site, a system to handle the= organization of the web site changes between Internet sessions so that during these sessions,= the remote web site can be updated quickly and easily. This system can be stored on a= laptop which is carried around by the web site=92s owner. Whenever the owner desires to= make a change to the site, he changes it locally; then the next time he connects to the= Internet, he uploads all of the changes he has made to the remote web site.
Of course, the owner is not the only person who will want to access his web= site. Although the remote web site can be made publicly accessible, there may be= interested users who only have limited access to the Internet; thus, they may also want= to keep local copies of the web site. Therefore, the system also needs to allow= easy copying of the site from the laptop and easy downloading of updates on the web site.
Several assumptions are to be made when designing this system. The web site= on the Internet is stored on a UNIX file system. The laptop onto which the site is= copied also uses a UNIX file system and includes a CD-ROM drive, a floppy disk drive,= and a hard drive. Before each trip to an Internet-deprived area, the owner of the web= site can copy its current content (and maybe other files) onto any number of CD-ROMs;= however, the CD-ROM can not be rewritten during the trip. Therefore, local updates can= only be made to the laptop=92s hard drive (and any floppy disks the owner may want to= carry). The other computers onto which the web site is copied and downloaded, however, may use= any modern file system, and might not even include a CD-ROM drive. The pages of the= site may contain link to other pages in the same site, but none to other sites. It is also= assumed that the person who keeps the local web site on the laptop is the only person who= has write access to the remote web site.
To prevent confusion, I will define here the local web site as the= site stored on the owner=92s laptop; the clone sites as the copies of the site on= other user=92s computers; and the home site as the site on the World Wide Web. = Owner refers to the person who manages the home site and carries the local= site.
In the system presented here, the organization of files on the local site= and clone sites uses symbolic links to the CD-ROM extensively in order to save hard drive= space. However, the home site (and any clone site on a system that does not allow links or= does not have a CD-ROM drive) does not use links at all because all of the pages are stored= on the same device; so saving space is not an issue. This system also uses several= files to keep track of which pages have been modified, and when they were modified. These= files are used to facilitate uploading, downloading, and copying of the site so that= copying every file is not always necessary.
Other features that should be added if possible include the following:
There are a number of ways to implement these features. However, some ways= are more desirable than others. Here are the criteria to be used to determine the= usefulness of the system:
Before the owner embarks on a travel, the CD-ROM should contain all of the= web site files as they are organized on the home site; in other words, all files should be= contained in the same directories as the ones at the home site. The directory structure= should also be created on the laptop=92s hard drive. Within these directories, symbolic= links should be created for each file in the web site (but not for the directories). These= links should have the same names as the files with which they are linked and should be in= the same directories (but on the hard drive instead of the CD-ROM). Whenever a= browser is opened on the laptop, it should only read directly from the hard drive. Although= the recreation of the directory structure on the hard drive takes extra work, it makes= updating the site much easier. Because the CD-ROM is read-only, reading directly from the= CD-ROM would be rather impractical; any time a page is changed, there would be no way to let= the browser know not to read from the CD-ROM for that page.
If the owner decides to modify a page, he needs to erase the symbolic link= to that page and replace it with a copy of the unmodified page. He can then modify this= hard drive copy. All future modifications to this page should also be made in the same= hard drive copy. With this system, only modified pages are stored on the hard drive,= thus minimizing the hard drive space used by the web pages.
With so many symbolic links and copies of pages on the hard drive, one may= think that there is a possibility of reading the incorrect version of the page. (in= other words, reading from the CD-ROM when the browser should be reading from the hard= drive) However, because the pages are required to use relative urls in all of their web page= links, the working directory of the browser will always be in the hard drive. So the= only time the browser would read from the CD would be when the file on the hard drive is a= link to a file on the CD. And the only time this file would be a link is when the= owner has not modified it from its original state (on the CD). Therefore, reading the= incorrect version is actually not an issue.
On computers that allow symbolic links and have a CD-ROM drive, clone sites= should have the files organized in the same way as the local site. A copy of the CD can= be given to the users of these clone sites. If a computer does not have a CD-ROM drive,= the only feasible way to organize the files of the clone is to copy all of the files= into a directory structure identical to that of the local site, and instead of the= links, use the actual files from the CD. This copying process is elaborated in the= Creating Clones section.
On computers that contain a CD-ROM drive, but use a file system that doesn= =92t allow links, the problem gets a little more complicated. Naturally, we could copy the= whole site onto the hard drive and not use a CD-ROM. But a lot of space could be saved on= the hard drive by using artificial links for HTML files: If an HTML page with one frame= is created on the hard drive, and a page on the CD-ROM is embedded in this frame, the file= on the hard drive will appear just like the page on the CD-ROM when viewed through a= browser. So essentially, this file (which will be hereafter referred to as an= artificial linking file) on hard drive is an artificial link to the CD-ROM file. (For an= explanation of frames, please refer to an HTML handbook.) Naturally, this technique only= works for HTML files; so other files, such as images, will need to be stored directly to= the hard drive.
An alternative to creating these artificial links when creating the clone= site would be to have the local site use these artificial links instead of the symbolic= links. Thus, creating new clone sites would be much easier. However, the artificial= linking file contains an absolute url that may not be valid for other machines.= (specifically, if the referring name to the CD-ROM drive is different than in the local machine.) = Thus, when the clone site is being created, the artificial linking files would have to= be recreated anyway. Because symbolic links provided by the file system are much easier= to handle than these artificial linking files, the local system should use the symbolic= links.
At the home site, there should be no linking in any way. Because everything= will be stored on a hard drive, the site only needs one name for each particular= file.
Update type | Argument 1 | Argument 2 |
---|---|---|
create_file | file that has been created or modified | (None) |
delete_file | file that has been deleted | (None) |
move_file | file(s) that have been deleted | directory to which they were moved |
copy_file | file(s) that have been copied | directory to which they were copied |
create_link | link that has been created or modified | file to which it is linked |
create_directory | directory that has been created | (None) |
remove_directory | directory that has been removed | (None) |
There are two ways to create the update file: One way is to change the= update file each time the owner makes a change to the contents of the site; the other is to= create the update file at the time of the updating of the home site.
First, let=92s examine the case where the update file is changed= incrementally (i.e., each time the owner makes a change). As an example, let=92s say the local site= has all of its files stored in the directory /www and its subdirectories. (In other= words, /www is the base directory.) Then, let=92s say, the owner modifies a= file in /www called pyramids.html. So in the update file, the line= "modify_file [tab] pyramids.html [newline]" would be added. Now let=92s say the owner moved= all of the .jpg files from /www/Cairo to /www/images. The= following line should then be appended to the update file: "move_file [tab] Cairo/*.jpg [tab]= images [newline]" Note that it is very important that the entries of the update file are= listed in the same order that the modifications were made.
When modifying the update file incrementally, relying on the owner to update= the update file could cause many problems. First of all, he may forget to do it= sometimes; also, he may type something incorrectly and not notice. Therefore, having the= machine make the updates might be more reliable. This could be achieved by having the= machine automatically call a routine to write to the update file every time the= relevant commands are called. (e.g., mv, cp, rm, the UNIX commands for= moving, copying, and deleting files, respectively.) Of course, if certain= applications on the laptop allowed files or directories to be moved, copied or deleted without= using the UNIX prompt, we would need to insure that the update files are updated in those= cases, too. Additionally, detecting modifications of files would be difficult using= this method; perhaps the machine could update the update file any time a file is saved. = With all of these complications, implementing routines to have the update files updated= by the machine is very difficult, if not impossible; so updating incrementally might not be= the best solution.
For a system that creates the update file only upon updating the home site,= a program could be written that does a long-listing recursive ls (using the= -R and -l switches) to list all the files on the hard drive that are part of= the web site, and then a long-listing recursive ls on the CD-ROM, and writes the= results of these ls=92s to files. Then the program could compare the two files to= check for modifications to the local site. Upon subsequent updates to the home site,= the program could create a ls output file from the current content of the hard drive and= compare it to the most recent listing file. Naturally, this method can not detect when a= file has been moved or copied, but instead sees only that a file is no longer in a given= directory or that a file has been created in a given directory. Because the end result= is the same, we do not need to worry about this issue.
Deciding between incrementally modifying the update file (the incremental= method) and creating the update file all at once (the aggregate method) is an= issue that has yet to be resolved. The incremental method would be much less prone to= bugs because each change to the update file is small; however, the feasibility of its= implementation is questionable. On the other hand, the aggregate method is probably much more= feasible to implement, but due to its complexity, is also a lot more prone to bugs.
Regardless of which method is used to create the update file, the user needs= a way to modify files. Because the file=92s original copy is on the CD-ROM, it can= not be directly changed. Therefore, it needs to be copied onto the hard disk before it can= be changed. One way to do this is to copy the file over its equivalent link on the hard= disk, and then allow the user to modify this copy. Another way would be to copy the file= into an update directory and change the appropriate link. Although the second approach= keeps the link structure from being disturbed, it has problems when two files of the same= name are updated. The easiest way to fix this problem would be to create a whole= directory tree the mimics the directory tree of the web site. Of course, doing this may be= a waste of space and effort; so the first approach seems more logical.
Upon receiving the update file, the home site puts it in the same directory= as the other update files and starts reading it, line by line. It executes the= appropriate command for each entry in it until it comes to a create_file entry. At this point, it= checks whether the a file has come in from the local site. If not, it waits. When the= file comes in, it checks that it is the correct file. If it is, it sends a reply. If not, it= assumes that the home site did not get the reply for the last file; so it sends the reply= again, then waits again. Once the home site has the correct file, it places it in the= correct directory. It then continues to execute the commands in the update file= until it encounters another create_file. This process continues until the update= file is read entirely.
An alternative to sending each file one by one would be to send everything= at once. This approach would work fine under normal circumstances. However, if some files= get lost in the middle of an update, the home site would need a way to tell the local= site to send those files again. Also, if anything else goes wrong during the update, and= the local site never gets notified that the server has completed the update, it will= be difficult to figure out exactly where the updating completed. With this method, the= local site can pick out exactly at which file the problem occurred.
The simplest clone creation would be on a computer that has a hard drive, a= CD-ROM drive, and a UNIX file system that allows links. In this case, creating a clone= from the local site would simply involve copying the appropriate directory tree from the= hard drive of the laptop with the local site to the other computer, and issuing a copy of= the CD-ROM to the user of the clone site. If the name of the CD-ROM drive is not the same= on the computer with the new clone site, the links need to be changed, too.
For a computer with a hard drive and CD-ROM drive, but a file system that= does not support the symbolic links created by UNIX, artificial links can be used, as= explained previously in the File Organization section. To create the clone site, the= appropriate hard drive directory tree could be first copied to the computer of the new clone= site. Then a program could be run to replace every symbolic link with an artificial link.= Because the symbolic link will probably not be able to be interpreted on the computer= with the clone site, a file first needs to be generated on the laptop with the local site;= this file should list all the symbolic links in the web site. This file could be= copied over to the other computer and then interpreted by a program to create all the= artificial links. Because artificial links operate through frames, if the computer with the= clone site does not have a browser that supports frames, the laptop=92s browser should be= copied also.
On a computer that does not use a CD-ROM drive, everything must be stored= without links. So when creating a clone on this type of computer, all of the non-link files= should be copied from the hard drive of the lap top with the local site. Instead of= copying the links, though, the corresponding files on the CD-ROM should be copied.
The clone site creations mentioned so far have all been from the laptop. = Because CDs can not be transferred from the home site, any clone created from the home site= will not use links. Thus, the file organization of these clone sites will be the same as= the one mentioned in the previous paragraph. When a user wants to create a clone= site from the home site, he first downloads a program that will serve as a sort of starter= kit for the clone site. When he runs this program on his machine, it contacts the home= site, and the home site sends out instructions on how to recreate its directory structure= on the machine with the new clone site. Additionally, of course, the appropriate files are= transmitted. This communication uses replies just like the communication for updating the= home site. The reasons for using this method are the same as before.
Sometimes, a user of a clone site may run into the owner again. In this= case, he may want an update from the local site. If the owner is on a different trip from the= last time he updated or created this clone site, he will have a different CD. This CD= will now contain all of the updates at the end of his last trip. Thus, all of information= recorded on the clone site=92s hard drive is now on CD; so essentially, to update the clone= site, all of the files from the local site=92s laptop=92s hard drive need to be copied over. = In other words, the same procedure will be used as is used to create a brand new clone site.= Of course, before copying anything, the portion of the clone site on hard drive needs= to be erased.
Naturally, creating a new browser does have some merits, though. With the= browser, the only way to modify the directory structure would be through the browser= editor, because if one tried to make modifications directly to the file, he could not find it= on the hard drive. He would have to rely on the browser to copy it over, update its= link so that it knows this file is now in the hard drive, and then let the user modify the= file. This characteristic provides the browser with some modularity; so the user does= not have to deal with low-level commands, and the program is easier for him to use. = Additionally, the browser saves some space on the hard drive because it does not create a= whole directory tree of symbolic links. However, these benefits are far outweighed by its= cost in complexity.
The design presented in this paper, however, fits the design criteria quite= well: The processes have been kept as simple as possible by using small programs and= human-readable text files to manage the processes. All of the features, both primary and= secondary, are supported by this design. The design is easy to use for both the owner of= the web site and users of the clone site; uploading and downloading require no more than= two steps by the user. Hard copies of the files are only made to the hard drive when= necessary; otherwise, symbolic links are used; so the disk space is managed fairly= efficiently. The user is given a lot of freedom in what he does with his files; he can= basically manipulate links, files, and directories as he pleases. The design here also takes= into account a variey of different environments in which a user may want to create clone= site. Although it may not cover every possible environment in the world, this design= handles the most common ones. By performing checks and using a send-and-reply tactic for= transmitting files over an unreliable network, this system increases its= reliability.
There are other issues that are not quite as vital to the system. For= example, when the user creates a hard link to another file in the web site, the current scheme= copies that particular file more than once when updating home sites or clone sites: once= for the original file, and once for the hard link. This phenomenon occurs because= hard links appear just like files when ls is called. This does not sound like a big= problem, but if the file is large, it could unnecessarily take up a lot of space on a clone= site, which could be a problem if the clone site has a limited amount of space to begin= with.
Another issue is increasing the number of computers on which clone sites can= be created. It is difficult to simply blindly predict what types of computers to expect.= We of course want to support the common case, as was explained in this paper, but what if= the owner meets a fair number of people who want to create clone sites on computers= that are not supported by the current design? We could try to think of ways to clone= sites on every single type of computer, but this would probably end up being a waste of= time. The best thing to do would be to prepare for the common case; find out what types of= computers are popular for creating clones; then formulate methods for creating clone sites= on computers that were not originally considered. An even better alternative would be to= first find out what types of computers are popular for creating the clone sites,= formulating methods to create the clone sites, and then letting the owner travel.
Portable web sites are very versatile and useful. Perhaps in the near= future, they will be used by scientists, ambassadors, politicians, and maybe even the common= tourist, to keep the public aware of any news that has occurred during one=92s travels.