Persistent Naming Discovery -- Pre-planning Notes

The Handle System at MIT
a.k.a. Persistent Network Resource Locators

Discovery Project Pre-Planning Notes

I. Project Motivation

The World Wide Web, the "killer app" of the Internet, is a boon to scholarship and the dissemination of information. MIT has a strong presence on the Web that includes rich stores of documents and data from labs and departments, MIT Libraries, and individuals. This is entirely appropriate -- except that putting information on the Web fails to fulfill one important function of a major research university, the preservation of knowledge.

Preservation and persistance is an important property of scholarly information, but which is commonly ignored for information stored on computers. For example: to a researcher, the bibliography can be the most useful part of a scientific paper, since it leads to more relevant or complete works; but the references in that bibliography must be valid after years or decades. Using HTTP URLs on the Web, this is not likely, since a URL is a specific pointer to a computer and file, and it becomes obselete as soon as the file is moved or the computer is renamed.

The CNRI Handle System was created to solve this problem. It offers a single unchanging, "persistent" name for a network resource, and resolves it upon request to the actual URL. It is both a layer of indirection and an archival-quality naming system. When a document references a "handle", the reference may still be in the form of a URL (to a proxy server), but the contract of the Handle System states that URL will never change. Whenever the resource referenced by the handle moves, its author has one place to update -- the Handle System entry -- and all of the links made to it will work.

MIT is already an online publisher of scholarly resources, both formally (via the Libraries and the MIT Press) and informally. The commercial publishing community is starting to adopt Handle System technology in the form of Digital Object Identifiers. It makes sense for MIT to enjoy the benefits of persistent reference names as well, and to encourage their use throughout the world academic community to produce more useful online documents.

The NCSTRL project and the Digital Library of MIT Theses are already using handles on an experimental basis, and would benefit for a formal committment to some persistent resource naming system. The MIT Libraries online catalog (Barton) needs persistent names to be able to include references to online resources, since URLs are too volatile to keep in the catalog. The Electronic Media Creation Center (EMCC) has shown interest in using persistent names for the content they are creating.

To succeed, a persistent naming service must be pervasive and permanent. Every publisher of a document worth preserving indefinitely at MIT and its associated laboratories is a potential customer, but they must be willing to adopt the persistent name as the canonical reference to their work. They must be convinced that it will always be available.

The MIT Libraries can help by lending their reputation and expertise in preservation to the project, and thus legitimizing the service. They can even help attract "customers" by using persistent names to list the online version of a document as a holding in Barton, and providing a live link to it in the Web catalog interface.

There is little technological risk, since there is already working, production-quality technology to implement a persistent naming service. The main risk is that it will be deployed but not used widely enough and thus we will be stuck maintaining the service indefinitely without all of the benefits it can offer.

One of the biggest challenges of this project is to decide how to integrate it throughout the Institute. The proposed technology offers a hierarchical namespace with decentralized administration; should it be allocated to bodies within the Institute and given over to their control? Guidelines for allocating names and namespaces must be established.

II. Resource plan

The Discovery project should take between two and four months, mainly because of the large number of people who must be solicited for advice and/or feedback.

Team membership:

Project leader: (Larry Stone, IS)
Administrative liason and Evangelist: Someone with a good understanding of the administrative and academic structure of the whole Institute is needed. Perhaps distributed among several people, ideally from Academic Computing, Libraries, or central administration.
Developer: Familiar with Web technology, Python, C, Solaris admin, etc.
Initial Namespace Administrator: The equivalent of a Postmaster for the Handle System, the chief executive responsible for the content of the service.
Initial Handle System Maintainer: Responsible for keeping service (h/w and s/w) alive and up-to-date. Small part of an FTE. Some system administration experience helpful.

Budget:

The required software is free and a license agreement has already been negotiated. The hardware required amounts to two minimal Athena Sun workstations -- Sparc 5s or even retired Sparc Classics.

If an Institute-wide mailing is needed to spread the word about the service, that also has to be funded.

III. List of project deliverables

1. Required.

Handle system infrastructure; 2 servers, our own "0018" namespace.
Rules & procedures/policy for admin of handle space. Includes the declaration of the mapping of handle namespace onto MIT administrative topology.
Adequate user-interface to support namespace administration.
Documentation and support, including guidelines for end users and sub-namespace administrators.

2. Desired.

Improvements to CNRI's administration UI.
Automated testing and probing of URL contents of handles.
Unix Netscape plugin to interpret Handle URLs (CNRI already has one for PC and Mac platforms).

IV. Preliminary work breakdown structure or task list

Research and brainstorm how the namespace should be structured and administered.
Recruit the initial user base.
(In parallel) Code improvements to Handle System administrative UI & security if needed, fix bugs.
Deploy a fully supported server machine.
Documentation of guidelines and procedures document describing how the handle system and namespace is to be administered.
Possibly develop Unix Netscape plugin to resolve handle "hdl:" URLs.

V. Communication plan

There are two primary tasks of communication:

Solicit advice and participation in the process of designing guidelines and procedures to administer the namespace. This is likely to be a typically iterative design process: develop a proposal, take it to a meeting and have it criticized. Perhaps develop a committee of key stakeholders and representatives from the Institute community.
Widespread publicity (advertising) when the service becomes available to invite content providers to use it. Administrative procedures must already be in place before this happens.

VI. Commitment estimates

The service requires a knowledgeable developer to keep tabs on it, but only a small fraction of an FTE devoted to the maintenance of the software and servers. The servers can be Athena workstations to minimize administrative duties.

The chief executive of namespace administration may have more extensive duties, calling for a larger fraction of an FTE -- but it is still probably nothing close to a full-time occupation.

Since the entire value of this service is that is persistent and unchanging, by providing it the Institute is promising to support it indefinitely into the future. The Handle System was carefully designed to be independent of specific network protocols and services so it can be ported to future incarnations of the Internet; this means MIT must commit to support it for as long as it is relevant.

VII. Discovery Charter

Everyone who has explored much of the Internet knows about "dead links": resource locators that no longer work because a computer or file has disappeared, moved, or changed its name. This is a particularly acute problem for Web-based scholarly publications that need to reference other resources, and remain usable for years. This project is about providing a simple and effective technology that implements permanent, archive-quality resource locators to help solve the problem of dead links for networked resources housed at MIT.

As a Discovery project, the most important goal of this effort is to decide whether MIT is to address this problem now and with this technology.

The Handle System at MIT a.k.a. Persistent Network Resource Locators