Removed from onlline archives on an undetermined date, and remounted on mit.edu/erm/ on 1/2/2002.
TECHNOLOGY REVIEW ON-LINE COPYRIGHT NOTICE
Technology Review (ISSN 0040-1692), Reg. U.S. Patent Office Copyright 1995, Technology Review, all rights reserved. Published eight times each year by the Association of Alumni and Alumnae of the Massachusetts Institute of Technology. The editors seek diverse views, and authors' opinions do not represent the official policies of their institutions or those of MIT. Articles may not under any circumstances be resold or redistributed for compensation of any kind without prior written permission from Technology Review.
The records manager of the huge provincial utility blamed the lost records on the recently installed computer network and worker unfamiliarity with the company's new practices for storing documents. But as she and the staff searched frantically for the stray document, she discovered that the problem was now a chronic one. Despite management directives that all employees print out paper versions of electronic documents and place them on file, the volume of paper records arriving at the central storage office had dropped by 50 percent within six months of the network's installation. The actual work was still being done, but workers were now deciding that they were their own best records managers and archivists.
In the company's former paper-only system, information about a defective part could easily have been found by retrieving a carbon copy of the purchase order, which a secretary would have typed and given to a clerk for filing in the records office. But in the electronic world Ontario Hydro had adopted, few secretaries and records clerks remained. The only ``filed'' copy of a purchase order, if it existed, would reside on an individual's hard disk drive, but then only if the employee remembered to save it in the first place, and if he or she had not deleted the years-old document while freeing up disk space.
Moreover, given that employees were allowed to name files according to personal whimsy, and that files were password-protected, access to the electronic version of the purchase order would be possible only if the employee had not left the company and was available when needed.
Ontario Hydro is not alone. Reports of lost or scrambled electronic data are coming in from a variety of sources, including major governmental organizations. For example, the United Nations recently discovered that methods for identifying, storing, and retrieving vital electronic data, such as field reports on social and economic issues in developing countries, had been completely ignored since the widespread introduction of office-automation technology. Similarly, the National Archives of Canada recently investigated the electronic files of one of the country's cabinet ministers and found not only that 30 of 100 randomly chosen policy documents could not be found in the government's paper records, but also that no system was in place to safeguard the contents of the electronic system. The National Archives and Records Administration in Washington, D.C., a repository for all government records, reports that older magnetic tapes containing data received from various government departments were suddenly unreadable after just 15 years. Because the temperature and humidity of storage areas in several departments were uncontrolled and the tapes were not rewound regularly or copied every few years, as recommended by archival conservation standards, the tapes became so brittle that they melted or caught fire when run on new drives that spun the tape some 10 times faster than earlier models.
Perhaps the most spectacular example of a government agency losing its electronic memory recently occurred at the National Aeronautics and Space Administration when space scientists were eager to access some 1.2 million magnetic tapes of observations that NASA created during three decades of space flight. The researchers were hoping to reveal ``long-term trends like global climate change, tropical deforestation, and the thinning of the atmospheric ozone layer,'' according to NASA, as well as new nuggets of information about the moon and planets. But the information could not be read or sometimes even found. Tapes were uncataloged. Some had been damaged by heat or floods. Many were unlabeled as to which mission or spacecraft or computer system created them. Because no proper archival controls for these records were in place, NASA officials estimate that it will take millions of dollars and years of detective work to link each of the files to their spacecraft or mission and then decode the information so that it can be read by hardware and software now in use. Underlying each case of lost data is a fundamental change in the way institutions now store information. For the first time in 3,500 years of archival activity we produce records that do not exist to the human eye - unlike Babylonian clay tablets, Egyptian papyrus, Roman and medieval parchment, modern paper, even microfilm. For the first time, business and professional people with no training and usually no aptitude for managing records are responsible for creating and storing them. Perhaps most significant, for the first time, we are not producing, managing, and saving physical artifacts, but rather trying to understand and preserve virtual patterns that give the electronic information its content, structure, context, and thus its meaning. Yet these patterns are completely controlled by software, which over the years will be modified, updated, and replaced countless times. Unless organizations adopt a means to control key records and continually migrate them to current software and new storage media, the long-term memory of our modern institutions will be in jeopardy, as will their ethical, legal, and economic health.
Disaster can occur not only because electronic information is hard to preserve but also because it is hard to control. Imagine that a chief executive officer sends a crucial policy-related e-mail message to her corporate administrators on November 23, 1994, and attaches a report containing graphs generated from spreadsheets linked to a database whose values change daily. The message and attachments detail investment strategies for the company and key clients.
Imagine also that one of the managers is later fired for failing to carry out the CEO's directives, thus having cost the company several important clients, and that he sues the company for wrongful dismissal, claiming he never received the CEO's e-mail message. If that same message had been sent in 1984 or 1974, or even 1904, it would have been a typed paper memorandum, addressed to the group, copied to others, and signed by the CEO, with a hand-drawn chart in the body of the text and figures and statistical tables in an appendix that would be physically stapled or paper-clipped to the CEO's memo. Any legal dispute could thus be settled by recourse to the paper file where the whole package sent by the CEO would reside, with evidence of signatures, routing-slip initials, or acknowledgment-of-receipt stamps.
Not so with the electronic version. Even if the computer system's backup tapes survive, which is no guarantee in many workplaces, could the corporation retrieve and, more important, reconstruct the CEO's compound electronic document two years, or maybe even ten years, after the fact? Could it prove that the offending administrator had actually been on the original e-mail distribution list and had been sent the document? Could it prove that he had received the document and either filed or deleted it? Could it recreate the attachment as it actually existed on November 23, 1994, from the ever-changing spreadsheet tables? Could it prove that no subsequent alteration or unauthorized access to the data or system had occurred?
The key to maintaining critical electronic information lies in being able to determine, sometimes long after the fact, not only the content but also the context of a record in question. Such a contextual view of information is the purview of archivists, who, unlike librarians, want to know not just what was communicated but when, by whom, to whom, where, how, why, using what media, and connected to what broader programs and activities, both now and over time. Using skills honed in managing the voluminous paper records of the modern state, archivists are now obviously obliged to develop similar approaches to stop the memory loss of the electronic age.
At the center of all archival thinking is the ``record.'' Whether a parchment court roll, a frontier-land patent, a business report on a paper file, or an electronic message, all records have three properties: content, structure, and context.
For paper records, all three elements are represented on the same physical medium. Content is most obvious: it is the words, phrases, numbers, and symbols composing the actual text. The structure of paper documents is also readily evident from the form used for special kinds of transactions: a business tax return is different from a land-grant certificate. The context for paper records is derived from the signature lines, the signature itself, the address and salutation, the letterhead, the date, the carbon copies or ``cc'' line on the bottom of the page, perhaps the surviving envelope, various stamp impressions or annotations of date of receipt or filing, the position of the document within a larger paper file of related documents, the file heading or title, the file's own place within a larger records classification system, charge-out cards recording who has read the file on what date, and cross references to related documents in other media, including photographs, maps, and so forth.
Archivists consider this contextual information essential to the comprehension of any ``record'' as a reflection of acts and transactions, and thus of institutional accountability. Without context, one is left with information but not a record, and no memory on which to base future decisions or defend earlier ones.
For electronic media, the content, structure, and context of the record change significantly from that of the traditional paper world. The only approximate match with paper is the content element, where the letters and numbers look much the same on the computer screen as on paper. But the structure and especially the context of electronic records are not apparent when retrieved from the text only.
Think of the CEO who sent out her message on investment strategies electronically. The interconnections of her compound document are not part of what the user sees on the screen, as they would be in a paper world, but rather are links in software or in the operating system. These instruct the computer to query the database, drop the relevant values found there into the spreadsheet, build a graph using spreadsheet formulas, and place the resulting graph in the appropriate spot in the word-processed report that is attached to the e-mail.
No such product is actually stored anywhere in the computer. Rather, at a particular moment in time, the software and operating system must stitch together information that is scattered in many places to form that virtual document. Upgrade or change that software and system, alter any of the data values, and those relationships among the e-mail, report, graphic, spreadsheet, and database are lost, as they are in the vast majority of systems operating in businesses and governments today. The virtual document vanishes. Corporate memory is wiped clean.
What can be done to stop the erosion of institutional memory in an electronic world? How do we protect the content, structure, and context of electronic records over time? Before addressing what will work, it may be useful to look at three options that have been proposed or tried, each of which has serious flaws.
The first option proposed and rejected by numerous corporations and government agencies is to impose a single hardware and software standard on all records creators - everyone within the organization must use WordPerfect 6.0 with Windows, for example. Such policy fiats would be virtually impossible to implement and enforce outside rigidly hierarchical organizations like the military and the police. Nor are they desirable, for they undermine end-user creativity, lead to unhealthy monopoly situations for the makers of the hardware or software of choice, and curtail levels of comfort with preferred technology. A related approach would be to preserve only generic data, such as ASCII text, which is not hardware or software dependent and thus could be read using ``off-the-shelf'' standard software. This in fact was the archival preservation option used in the 1970s and 1980. But it is no longer feasible for today's software-dependent records, which are to complex to translate into ASCII format.
A second option proposed by several entrepreneurs in the past few years is to create a cybernetic museum with working models of every known piece of obsolete computer hardware and software, so that institutions and archives may gain access to old files and convert them to whatever may be the present-day standard. Unfortunately, the likelihood of keeping any piece of machinery running for many decades is simply not very high, since replacement parts, chips, and software could not be easily reproduced. A computer system is far more complex than a steam locomotive or shuttle loom.
A third option increasingly favored by information technology professionals is to dump all electronic information in no particular order on CD-ROMs or high-density diskettes, and then to search them for the required subjects using ever more powerful artificial-intelligence text-retrieval programs. But while related material can be retrieved in this approach, so would be a great mass of extraneous information containing the same search strings.
For example, one researcher at the National Archives of Canada recently used such a strategy to try to find information from the defunct Trade Negotiations Office concerning plans to expand sales of Canadian freshwater to the United States. He searched the agency's electronic files for references containing the word ``water.'' Even though the trade office was in operation for only a couple of years and employed only a few people, the researcher found more than 600 items containing the word ``water.'' Yet while some related to the subject, many did not, especially since archivists, faced with a save-all or delete-all situation, chose to preserve all the backup tapes of the system. Thus the researcher found many items like ``Meet me at the water cooler,''and ``My report was sure watered down by the boss.'' Moreover, references that might have detailed crucial policy decisions but did not contain the word ``water'' were missed entirely: ``About that matter we discussed this morning, the Prime Minister instructs me to tell you that under no circumstances shall we bargain it away unless the United States makes major concessions in agricultural products.'' Free-text searching, while better than nothing, does not uncover all the relevant records related to a particular function, activity, or transaction, nor does it preserve the context of or reason why a record was created.
A fourth option, now being explored by a team of archivists led by Richard Cox and David Bearman in a project at the University of Pittsburgh, and by John McDonald at the National Archives of Canada, is to determine the functional requirements of defining and safeguarding a record in a world of virtual rather than physical documents. The Pittsburgh project team has thus far determined the following set of needs for capturing, maintaining, and using electronic records:
The Pittsburgh team believes that each organization should assign to a chief information officer or other senior staff formal responsibility for implementing these and other guidelines for generating and protecting records in new system designs and system reengineering plans. In fact, the next steps of the Pittsburgh project, which is funded by the U.S. National Historical Publications and Records Commission and scheduled to be completed in 1995, will help information officers meet these goals.
The team plans to translate the guidelines into technical specifications that programmers can use to instruct computers to automatically create appropriate records. In the terminology of the field, programmers would be creating metadata, which are additional data that encapsulate or surround the original data and tell them how to act, place them in the context of the business transactions to which they relate, and maintain their integrity and authenticity.
Software companies would design these record-keeping capabilities into their new products, especially integrated business applications such as word processing, spreadsheet, graphics, and database programs. End users will create the markets for such new software products either by recognizing their intrinsic value for safeguarding corporate memory or by responding to the growing number of data disasters and lost records. As society shifted a millennium ago from the oral to the written record, the focus of archivists changed from remembering an action to caring for the written artifacts that gave evidence of the action. As society now moves from written records to virtual documents, archivists are offering their traditional understanding of the structure and context of recorded evidence as protection against the widespread amnesia now threatening our electronic world.