MIT Communications Forum

Thursday, April 20, 2000
5:00 - 7:00 p.m.

Summary

[The text below is an edited summary, not a complete transcript.]

Marlene Manoff, MIT Humanities Library: As we were planning this event, I looked up the summary of the last communications forum devoted to library issues entitled The University, the Library, and Information Technology. It was held precisely five years ago, but I was struck by how little the issues had changed. I particularly noted that Charles Vest mentioned Project TULIP, an attempt by the publisher Elsevier to partner with a number of universities to provide full text access to about 40 journals in materials science. Although it turned out to have been misguided on a number of accounts, it was also one of many steps in our growing understanding of what users want and the kind of infrastructure we can support. Right now, many projects later and under the stewardship of Ann Wolpert, the MIT Libraries are embarking on a new 1.8 million dollar project with Hewlett-Packard called D-Space. It will provide access to the work of MIT faculty authors. Given the complexity of that project and digital issues in general, we are fortunate to have with us two experts on these subjects. Both of them are leaders of organizations that are helping us contend with the issues raised by electronic technology.

Clifford Lynch, Coalition of Networked Information (CNI): Let me start with just the title of the session, "Digital Libraries." This phrase is both a blessing and a curse. Much like books, libraries are cultural totems. They wrap together all sorts of feelings, social contracts, and roles. Talking about the electronic book or the digital library has an ability to simultaneously get people upset, inspired and concerned. Even if the phrase is profoundly oxymoronic, we are reluctant to discard it because it is so evocative. Let's look at various views of what the term means as a way to explore some of the controversy that is packed in it.

What is a digital library?

Perhaps we can know digital libraries by example. Over the past five years, the National Science Foundation in conjunction with ARPA, NASA and other government agencies, spent something on the order 30 million dollars funding Digital Libraries. Presumably, if we look to the products of those grants, we will learn something about digital libraries. They were discipline oriented information environments that had collections of resources and tools for researchers and students in specific fields. Very few of them involved the libraries at the institutions to which the grants were given. Perhaps this is a functioning definition of digital libraries. In the commercial space, there are examples that line up well with this definition. For example, services like Lexis/Nexis or Westlaw in the legal world are reasonably comprehensive information spaces for attorneys and law students. While it is hard to get as inspired about Westlaw as we can by some of our amorphous ideas about dig ital libraries, these places are where knowledge breeds, interbreeds, and grows. These are some of the best examples we have of functioning digital libraries.

Does this view of digital library as information system incorporate active tools of knowledge creation and refinement? Bill Wulf coined a term in the late 1980's called collaboratory.

"A collaboratory is a "...'center without walls,' in which the nation's researchers can perform their research without regard to geographical location interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries."

That has a lot of elements that resonate with the definition of digital library as information system, but at the same time connect to the active pursuit of science rather than the passive storage and review of scholarship. There are systems that are at various places in the spectrum between a purely passive information repository and a fully realized collaboratory. As we move towards the active end of the scale, a whole new set of skills and issues begin to show up about structuring and refining knowledge.

What do digital libraries have to do with "traditional" or "brick and mortar" libraries?

The term library is traditionally used for three things:

a "collection of stuff"
a building or other facility that houses a "collection of stuff"
an organization that typically encompasses the function of selecting and acquiring, organizing, making available and insuring the long term preservation and access to a "collection of stuff"

Another way of thinking about digital libraries is seeing digital collections as a marker in the evolutionary growth of existing libraries. Research libraries are increasingly encompassing larger and larger digital collections, which sometimes supplant and sometimes supplement parts of the historical collections of print and other media.

There are well over five thousand established scholarly and scientific journals available in electronic form which research libraries are licensing. In due course, they will let go of the print versions. Libraries also have special collections of manuscripts, photographs, and other one of a kind materials that are fragile. Historically, access to them has been very privileged and place bound, but now much of it is getting digitized and placed on the Internet. This opens up a new complement to the traditional books and journals to a much wider audience, and it is changing the way that research in some areas is conducted. On the other hand, there is a wide spread misimpression that almost everything is in electronic form. This is not true. Very few books are available in digital form. This is not a conspiracy of librarians, paper lovers, architects and other groups to prevent everybody from getting access to this wealth of electronic books. There are some real problems about how we use books in digital f orm. With journals, people skim online, decide what they need to print, and then use paper as a user interface to actual articles that they want to study. That works a lot better with a short journal article than it does with a book.

What issues arise as scholarly communication moves into the digital environment?

As journals have moved into a digital format, a tremendous loss of cohesion in the scholarly literature has been lost. The material is scattered across the publishers' websites, so you have to know who publishes what. This is a huge challenge that no single library can take on by itself. It calls for a collective approach by scholarly publishers and research libraries to rebuild journals as a system so that it is possible to do things like search and use active citation across journals from different publishers.

We are in the worst of both worlds at the moment with electronic journals being published in both print and electronic form, thereby maximizing expense for everyone involved. If you ask publishers why, the reason they give is that there isn't a viable archiving strategy to explain to readers, authors and the libraries that buy materials. Since the process of preserving paper is known, they continue to produce the paper, assuming that it will deal with the archiving problem in the short run. This is a bad long run solution. There are visions of scholarly communication that fully exploit the electronic environment, but this dual publishing model presents an enormous barrier to including interactive components such as multimedia, databases, simulation models. Some of digital material is growing up in new renegade genres of scholarly communication outside the normal system of publication. We're going to have a terrible time integrating them into the system of publication until we resolve the issues of archiving digital information.

What issues arise in archiving digital information?

The preservation and archiving role that libraries have historically filled is going to be more important in the digital world. We have learned a fair amount about preserving digital information, but there is still hard intellectual work left to do. We realize that preserving media is a losing battle, so we preserve and manage bits across ever evolving infrastructures. We know how to do that, but it is expensive. Computer centers that manage large amounts of data don't just give up on all the data and start over because a new generation of storage device has come out. While we understand archiving digial materials that share a lot of the qualities of books or journal articles, we do not understand what it means to archive a database or web site that constantly changes. We don't know if we need to record the full evolutionary development or when snap shots are adequate. This is all part of the technical agenda of archiving.

We also face difficult economic and legal challenges in archiving. Our system of copyright law is inhospitable to our ability to manage our heritage in digital form. Instead of digital material thriving under benign neglect, it vanishes within a few years. A printed work can sit on a shelf until it goes out of copyright as long as the physical environment isn't too hostile. On the other hand, digital information needs constant care during the 90 years when it is under copyright and highly constrained legally. The legal constraints prevent the maintinence of this material by groups like libraries, unless they specifically negotiate for the rights to do this

Again, there is tension between centralization and distribution. It is not practical for every library to preserve digital information. There are great economies of scale to centralization. On the other hand, I don't have the hubris to say we can get all the engineering right to protect ourselves against all of the stupid things that can happen, so I am not comfortable betting our intellectual heritage on a single central archive. There are going to be unforseen physical disasters, political meltdowns, funding fowl ups, and personal lapses in judgment. We need more than one, and less than a thousand. Those are some of the challenges that we face in building libraries that truly can not just acquire and offer access , but in deed preserve digital information that is increasingly coming to represent our intellectual and cultural heritage.

Deanna Marcum, Council on Library and Information Resources (CLIR): Three years ago, a group of research libraries (then 17, now numbering 24) decided to invest their own resources in a loose organization called the Digital Library Federation (DLF) to work on the "big problems" of digital library development. They asked the Council on Library and Information Resources (CLIR) to host the activity.

Defining Digital Libraries

We worked with the institutions to develop the following broad working definition.

"Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities."

It is hard to imagine what doesn't fit in that definition. Our goal was to create digital libraries that worked as well or better for the scholarly community than the traditional libraries we know. Here is a more elaborate description of what we meant by these terms.

Organizations that Provide Resources: We recognized that digital libraries would employ many different kinds of resources, and that these resources need not be organized like a conventional library. In fact, these resources need not even be in the context of a conventional library. We recognize that there will be many different resources. For example, there are many cottage industries growing up that serve the need of scholars in specific disciplines. The Los Alamos Physics Preprint Service is a good example of a kind of digital library effort, and there are going to be others based on disciplines over time.

Preserve the Integrity of Collections, and Ensure the Persistence Over Time: In 1996, CLIR and the Research Libraries Group published the work of a task force on the whole subject of preserving digital information. That task force was made up of librarians, scholars, archivists, lawyers, government officials, museum currators and preservation specialists. In that report they laid out what would be required to insure the integrity of digital objects and make sure that they persisted over time. We took that report as the basis for our work in the preservation of digital materials. We do recognize that the integrity of digital objects and preservation are different. They are very much connected, but they have to be treated separately.

Collections of Digital Works: The 24 institutions involved in the DLF are large research libraries that have taken collection development seriously and manage huge stores of print materials. They wanted to make sure that the same principles that guided their selection of those print works would be equally important in the digital environment. They also wanted to make sure that the users of the materials were involved in making the selections for what goes into the digital library. Over and over again, they came back to the same question: how do we make sure that we have integrated the digital resources into the traditional library in a way that is most useful for the scholars, researchers and students?

Readily and Economically Available: Here's the part that is the hardest to talk about. What does all of this cost? We wanted to make sure that we paid close attention to the economics. Even though much of the digital library work that has been done so far has been grant funded, we know that won't last forever.

Use by a Defined Community or Set of Communities: Finally, we stress it in all of our projects, digital libraries are service organizations as we define them. They pay attention to the needs of the user community, and we realize that as we develop larger and larger digital library projects, it is going to be very important to look at the use made of these materials by users and then tie what we put into the digital libraries to that.

Agenda of the Digital Library Federation

We are now working in five areas: digital library architectures, metadata, digital collections development, users and user services, and digital preservation. The agenda includes the following five key points:

Achieving Greater Consistency, Predictability and Interoperability within Library Science Communities: Each of the institutions in the DLF will develop its own digital library, but the hope is that by working on projects collaboratively, they will learn from one another and it will accelerate the development in each of them. Some of the most important work being done concerns how to get greater consistency, predictability, and interoperability in library service environments so that each of the institutions will be available to others. For example, the architecture group is working on looking at ways to think about meta data harvesting, so that materials can be made available to search services. They are working with the Open Archives Initiative, looking at any of those ways that materials that were created by different groups can be searched seamlessly by researchers, scholars and students.

Collections Development: The institutions in the DLF need to think about business models for digital surrogates. Many of the them have fabulous special collections in their traditional libraries that they want to put into electronic form. These are one of a kind things, and so there won't be an economy of scale. What are the business models for digitizing those collections and making them available? Are there certain processes we can adopt that will make the process cheaper or more cost effective? They will also be reviewing the guidelines for obtaining access to commercial and other third party data sources.

Users, Use and User Support Services: We are starting to work with several groups who understand more about how we quantify use of digital collections so that we know if these collections are effective for the purposes they were intended to serve. At the same time, we are looking at how the availability of digital resources changed the function within the library. What does it do to reference service? What does it do to public services generally?

Digital Preservation: CLIR and DLF are considering the criteria for achivability of electronic journals as a place to start. We have already met with a group of librarians to consider some of the criteria for archivability. We have a meeting scheduled with a group of publishers to look at these criteria and help the publishers think through what their interests are in this question of archiving and a third group of licensing experts to begin to work out the language that needs to appear in licenses that will ensure that these electronic journals are preserved and made accessible over time.

Roles and Responsibilities of the 21st Century Research University Library: We mostly work with research university libraries, and a number of the directors are beginning to look at possible organizational models of funding, then sharing what they learn about these models with their colleagues.

It is important to note that the 24 institutions are investing their own money in this project. They realize that while there is some grant money now for some of these digital library projects, that the long term development will depend upon the creativity and innovation of libraries today, so they are working together to try to accelerate the pace of learning because they think it is so necessary.

RESPONSE

Ann Wolpert, MIT Libraries: It is refreshing to know why those of us who work in libraries feel tired all the time. We are clearly being asked to do six impossible things before breakfast, just like Alice in Wonderland. We work in a hostile legal environment to perform impossible tasks with insufficient funds in a climate of totally unrealistic expectations. Other than that, it is a piece of cake.

There's a story about a young faculty member at Oxford University that was counseled by a senior member of his department to never to schedule a meeting on a Wednesday, because it would ruin two weekends. When I think about the phrase "digital library," I think it ruins both "digital" and "library." From where we sit in real libraries, digital information does have wonderful attributes and functionality that can be complementary to the traditional strengths of libraries. It appears to us that it may layer on top of all the other forms of information that have emerged over the course of time, and end up being one more thing that is managed in libraries, rather than something that makes libraries go away.

The question of the durability of digital information is one that concerns us greatly, and it is a key aspect of the digital library environment that we hope to spend time on as part of our partnership with Hewlett Packard. I heard a speech one time where the archivist of the United States was talking about her favorite vase in the Metropolitan Museum that had scribes around the outside, some of whom are working on clay tablets and some of whom are working on papyrus. She always figured what that was about was someone saying, "you know, this papyrus isn't very permanent, so we better put it on a clay tablet." There is some of this in the digital environment, in that we are trying to make digitally formatted content behave the way that other kinds of content behave. Its not a good fit, because it isn't like other kinds of content. As Nicholas Negroponte points out, there is a substantial difference between bits and atoms.

Clifford defined libraries as "collections of stuff." He knew I would rise to that. I have always thought of libraries as a set of services. Those services at MIT are dedicated to facilitating and supporting education and research. How we go about that is very much a function of what resources are available and what form those resources take. We don't set out to "collect stuff." We set out to support education and research, and are uniquely qualified by training, experience and interest to do that. When someone reaches out their hand to get a piece of information that is required to support education or research, its our responsibility to have performed the services that make it possible for them to have the resource be there. Here at MIT, we have been dealing in multiple formats for a long time, and are aware of the difficulties of integrating those formats to facilitate and support education and research.

There is a lot of work at MIT that involves thinking about large complex systems. What happens in conversations about the digital library has to do with the fact that people are thinking like blind men about the elephant, each holding on to the piece of the problem that they can touch. Those who produce information, such as a faculty member who writes an article, see the piece of this large complex system from the context of wanting to get this article published in a well respected peer reviewed journal. They are not conscious of the rest of the elephant. Those of us who work in preservation are conscious of how on earth you keep these ears from getting tattered and torn, and we are not so conscious of how information gets into the system. When Clifford spoke about the importance of preservation in rebuilding journals as a system, I thought of how dauntingly difficult it is to think of that in the economic, legal and socio political climate in which journals are created. We need to understand this when we talk about the digital library. Often, we are only talking about that piece that we can see and understand, and we should never assume that that piece is all there is to the problem. The production, publication, continuation and preservation of information over time is indeed a large complex system and there is a lot work to be done on every level of that system.

DISCUSSION

Archiving New Types of Digital Information

Felice Frankel, Science Photographer, Edgerton Center, MIT: I image science for all sorts of fascinating people at MIT, and there is no question in my mind that imaging science is going towards temporal capture. We are going to see more and more animation, simulation, and 3-D modeling. Is anybody in the library world addressing the fact that scientists are going to be describing their data this way in terms of archiving.

Lynch: Things like video have been the poor step children in libraries forever. We are at the beginning of an enormous flood of video, sensor and imaging data. The preservation of that is going to be important, but it depends on formats settling down. A massive challenge in this area is organization. We have a kind of interesting problem with video, because typically you have the description of an object, then you have navigational aids inside the object. A book can have a citation describing the book, and then it has an index at a greater level of detail inside of it. We don't understand how to do this with video or more generally, continuous media very well. We need algorithmic ways of doing it. There's been some interesting work done in video segmentation and indexing in a number of places. A lot of it that is still research. There is some other work on the fringes that is is fascinating. People are essentially doing non-destructive 3-D capture of objects, including what's inside the objects. Then there's also some people playing with systems that will refabricate the objects. That's neat, but I'm not sure what we do with it. It is another new genre that we don't understand well.

There is also still a very open question about whether scientific data is ultimately going to come under the stewardship of the traditional research library, or fall under the direct stewardship of the scientific community. If you look at what's happening at areas like molecular biology, astronomy, or planetary science, most of that data is not being managed by libraries. It is being managed by scientific centers. That is not to say that information science expertise doesn't have a role. You are seeing people on these teams who are coming out of places that we used to call Library Schools and now we call things like Schools of Information Studies or Information Management and Systems. You see these new disciplines showing up like biomedical informatics, which require some of the same things, but they are skills rather than organizational connections in many cases.

Wolpert: In some ways, it is a real tribute to libraries that as new technologies come along, people tend to automatically assume that libraries will capture and save this content without any understanding of how complex the technology is that underlies it. Author C. Clarke said that any sufficiently advanced technology is indistinguishable from magic. As we look at the pace at which technological advance drives the creation and management of information, we have new formats emerging at a breath taking pace. People typically understand their applications, but they have no idea what technology underlies it. You tend to think the application is simple, because it appears simple to you, whereas the technological underpinnings of that application may be incredibly complex, difficult, and hard to manage. So there is a real issue here around the preservation of materials, because those who are producing content in these new formats don't tend to know how complicated it is to carry these things forward. At the National Archives, they have been struggling with how to keep e-mail, which is just text based. They've decided that you keep e-mail by printing it out on acid free paper.

One of my concerns is that short sighted decisions will be made that have long term implications. I recently had a conversation with the president of the American Chemical Society, and he assured me that the they could be trusted to keep chemical data information forever. I pointed out to him that they price, sell and deliver that information on a membership model. However, my mission as a research librarian is to care about guaranteed access to that material by people who aren't born yet, or who are two years old and don't know they should be subscribing to the Journal of the American Chemical Society. This emerging digital environment stresses a lot of our assumptions about how information is created, managed and sold. The definition of a library typically includes that material is selected, organized and maintained for the benefit of a defined community of users, and that information is made available, not for sale, but for use. A lot of models for preservation are sale based rather than use based.

Marcum: I think we should just acknowledge that research libraries haven't by and large cared so much about other kinds of materials. When I was at the Library of Congress, I was in charge of all of the special collections. The Director of Information Technology kept saying to me, those are in the second phase. I finally realized that the second phase was never going to come. On the other hand, a lot of visual materials are of great importance to commercial firms like Disney. I think you made an excellent point, and I hope all librarians took note. I think we have to take much more responsibility for these formats we haven't paid that much attention to in the past.

Wolpert: But its not simple, because the formats in which data are produced change. Even at a place like MIT, when we put content up on the network, we can not guarantee that faculty or students have the equipment to read or use that, let alone find it. They may not have the capacity for memory or a color screen. There are all kinds of assumptions about the delivery of digital information that are equipment based, whereas the beauty of the print environment was that you didn't need to worry about whether you had the specific equipment that was necessary to look at a work if it was done on a medium which was predictable. Now we are moving into an environment where we may be able to preserve content, but we can't do anything about the predictability of your capacity to look at and use that information. It is a complicated problem.

Defining Project Success

Bill Cattey, MIT Information Systems: I am an alumnus of the TULIP project. I was one of the principle implementers and architects. Now I am the project manager for the new D-Space Project. In a certain sense, I agree with the overtones that were presented about TULIP. It is looked upon as something that we learned a lot from, but ultimately failed. So what are some success criteria for this new project? Recognizing that we live in the context of doing the impossible on insufficient resources in a hostile environment, what are some ways that we can come away from this new project feeling it is a success?

Lynch: I am also an alumnus of the TULIP project. I would take issue with whether it failed and argue that it underscores one of the very hard things about doing research projects. The TULIP project was a four or five year project that took place against immense changes in information technology environments. When we started, there was no web. When we finished, browsers had proliferated. Due to the development cycles and the limited number of people who could be put on development efforts, we made engineering decisions prior to the emergence of the web which were hard to change mid course. In terms of a project to gain insight into how people used electronic journals and the costs and complexities of mounting them, TULIP was a roaring success. We learned a number of things not to do. If you look at the model of licensing and implementing journals that evolved since, it is quite different, in part because of the economic and operational experience gained with TULIP. As a research project to generate data to guide future decision making and system design, TULIP was tremendously valuable and enormously successful. It didn't succeed as a project to build a prototype system that neatly turned into a production system. In the information services world, we tend to confuse experiments, prototype services and beta releases of production services. We assume that anything that doesn't turn into a production service is a failure. This has led us to be very conservative in our experimenting, and there's a lot we don't know because we have been very timid and only do prototypes that will scale or migrate into production systems.

It is also difficult to get people to use experimental information systems. Most people are busy and unwilling to invest time and behavior changes in using something that is going to go away in six months. A lot of projects are very perturbed in the first year or so by novelty effects. For example, at the University of California, back when databases were still fairly rare, we would release an indexing and abstracting database maybe every eight months to a year with a carefully calculated campaign of fanfair and education. It would be two or three years before usage evened out. In order to get good data, you've got to run a project three, four or five years against a backdrop of enormously volatile technological changes. Its no accident that many of the commercial products now have life spans of a year or two and collect no usage data as they go. That's not a bad strategy, but its a poor strategy if you want to learn something.

The Promise and Perils of Digital Libraries

David Thorburn, Director, MIT Communications Forum: Many things have surprised me in today's discourse. First, I am surprised by the gloom in so many comments. We have remarkable technologies that hold immense promise for libraries, but mostly today we've heard a litany of woe. I would like to hear more about what is exciting in these technologies. I am also surprised by the relative silence concerning the traditional library. When I consider the term "digital library," I don't immediately think about all the new materials in digital format, but about the extraordinary ways in which the traditional functions of libraries may be enhanced or even transformed by digital systems. Images from the collections in art galleries located in specific places are now accessible around the world, the Oxford English Dictionary is on line, computer-users anywhere on the globe can gain almost instant access to the catalog of the Library of Congress and many other great research institutions. And these wonders are trivial in a sense, they're small matters compared to the astounding promise of digitally-enabled methods of information storage, access, search and retrieval.

Marcum: I couldn't agree more that the great benefit of digital libraries has been in increasing access. It has made it possible for a small college library to provide access to things that students would never have had a chance to see before. Its more deep concern rather than gloom. Digital collections are fragile, and we worry that the people who generate them don't realize that in two years, unless someone pays attention to making them available, they won't be. Our role in society is to be stewards of the intellectual record, and we worry that we can't do that. For the first time in our professional careers, we are dependent upon other people to make these collections available, and we don't know that we have their support. We don't have the connections made yet to insure that these things last. Somebody said forever earlier. I think forever means ten years now. Forever doesn't mean the same thing anymore.

Lynch: I am not a librarian, although I occasionally get cast as one. I'm primarily a computer scientist -- a systems builder. I look at these things with a great deal of optimism about how they can enhance scholarship. The possibilities for creating and disseminating knowledge are stunning, and the opportunities for democratizing access to material are fabulous, but there are caveats. I remember the excitement of when we first put the catalogs from major universities online, and I also remember the subsequent disappointment. People realized that they could sit in Australia and search the Library of Congress to get a marvelous lists of materials they couldn't get. Our legal and economic system lags terribly far behind the promise of the technology. It was incredibly frustrating that the network is the ultimate tool of interlibrary loan technically, but isn't viable legally.

Wolpert: I am in the position of someone selling heroin in the school yard. I spend an enormous amount of MIT's money on an annual basis to license access to resources that would go away in a second if I ever stop paying. People don't have a clue how fragile that is, or how vulnerable we are collectively to the pricing strategies of those who make that information available digitally in a licensed environment that we don't own. Have you ever heard of the Digital Millennium Copyright Act? It will enable those who put this licensed information on the network to encrypt it, and it will be a felony to get around the encryption if you have reason to use that material that qualifies as fair use. As someone who is responsible for paying the dollars associated with this new truly magical environment, I am profoundly worried. We are way out there in terms of what's possible technically. Most of us who work in libraries have been there and done that, and now we are worried about what the future holds because of our concern about commercial interests imagining extraordinary profits from a monopoly environment.

Lynch: I had the good fortune to serve on a National Research Council committee looking at these issues. They issued a report through the National Academy called The Digital Dilemma: Intellectual Property in the Information Infrastructure. One of the most troublesome things that's happened legally is the Sonny Bono Copyright Extension Act (hisses from the audience). Due to pressure from some large media companies and recognizing that certain critical things like the early Mickey Mouse films were about to pass out of copyright, Congress quietly extended the term of copyright for 25 years with very little discussion. The problem is that there is this vast fallow field of books that have been out of print for fifty or more years which are hard to find but are useful for scholarship, but we've declared a 25 year moratorium on moving them to the Internet. Now, in order to make use of this, you have to go through a phenomenally costly process of tracking down the heirs of some obscure author from thirty years ago to clear rights for this thing. That will cost you far more than the cost of digitizing it or publishing it digitally. I see tremendous contradictions between the potential of the digital technology and the social things we're wrapping around it.

Preserving the History of Digital Media

Henry Jenkins, Director of Comparative Media Studies: What artifacts of the digital age should our libraries be collecting? A a historian of media, if I wanted to do the history of the web today, I would have more luck doing cinema at the turn of the century or even live theater in the nineteenth century. Many of the sites that were up during the first year or two of the web are now gone and will never be recoverable. The same would be true of digital cinema which is only two or three years old. I also just read a news story about the problems of getting access to restored games in a way that they can be played and experienced in something like their original form. Libraries historically collected popular fiction and other materials that would seem to be comparable in their social impact to games, but by and large libraries have not gotten into the question of collecting them. How can we collect and preserve the forms of storytelling and entertainment that are emerging in the digital age, and keep them accessible?

Marcum: When you look at the digital library, only a few of them kept popular fiction. There were some comprehensive libraries and some highly specialized libraries that did. Very few research libraries kept dime novels, and that's why we go to Minnesota to see the good collection of them. We have to work with people who are developing the archives in special areas, so that they bring their materials into an archivable repository that can be kept.

Manoff: There is a misconception that libraries have collected everything. That has never been the case. Libraries only collect a very small percentage of the total output, and even something like popular fiction or pulp novels have only been collected primarily after they became of academic interest. We follow what's going on in our programs, and if faculty need these things, then we figure out how to store them. We are always a little behind.

Marcum: Unfortunately, in the digital world, we can't wait. In the paper world, we could wait. We have to be more active in identifying where work is being done and harvest it, because we won't have a chance otherwise.

Manoff: We have lost so much of what was produced in paper. It is not as if we have it all. It is much harder to store things in electronic format. Half of the problem is knowing what it is that we want to preserve.

Jenkins: Your story about the Library of Congress printing out e-mail on acid proof paper comes to mind. Most of the early films of the twentieth century survived in paper prints rather than in celluloid because of the copyright policy of that period. By the time libraries got around to collecting cinema, we had blown up most of the existing nitrate prints, and we lost the bulk of the films made in the first 20 years of our history of the medium. If it hadn't been for the weird nesses of copyright law that led to the paper print collection at the Library of Congress, we might have nothing.

Marcum: Only about half of the films that have been made in this country are still extant.

Thorburn: A much smaller number of television programs or sound recordings end up being preserved. So we are talking about pre-digital formats that need protecting. One implication of what you're saying is that, if we are sensible, we need to work quickly to identify certain groups or consortia in different specialties to be responsible for aspects of the emerging digital environment, rather than waiting for every library to take on the full job, since that probably would never happen.

Wolpert: It is impossible for all the libraries to do all the things. I hope that they will take responsibility for certain portions of the scholarly record, so then we have half a chance of having it. Tip O'Neil used to say that all politics is local. Historically, all archiving has been local. Things have been archived because there was sufficient institutional commitment to put resources behind the archiving of a discipline, a material type or a form. Research libraries today find themselves in a strange position between the proverbial rock and the hard place. For at least a decade, it has been assumed that everything would be digital and free, but nothing could be further from the truth. People are beginning to understand that the costs of working in the digital environment are substantially higher than working in the print environment. In the print world, you just turn on the lights. In the digital world, you need the appropriate platform to look at the material, and platforms don't necessarily migrate in a way that makes it possible for a successive generation of hardware and software to use what preceded it. The costs implications associated with archiving material as it comes out in new formats are typically not visible to those who want it. We want to do this, but where do we find the resources to do it? If it is a zero sum game, what do you gain and give up in order to do it?

Manoff: Another reason that this sounds so gloomy may be this context. Every time I have sat here in the Bartos Theater at the Media Lab, I have listened to people talk about some wonderful new technology and how terrific our future would be. This is definitely an odd program out for this particular space. I can understand what David Thorburn said about the gloom.

Wolpert: This panel is what happens when the Media Lab's ideas come home to roost.

The Role of Librarians in Digital Libraries

Arne Hessenbruch, Dibner Institute: The Dibner Institute has just been given a grant by the Sloan Foundation to put the history of twentieth century science on the Internet. I work with that, so I guess I'm in a cottage industry. I was very taken by this concept. What are cottage industries, and what will be the problems of merging them in the future?

Marcum: Most of the cottage industries I am familiar with are in the sciences, although there are a few in the humanities. Many of them started with a single advocate who cared about trying to archive all of the material in that field, and it usually grew with external funding to became a formal archive. The difficulty is that the way the person did things has often been institutionalized, so it is hard to put them together. However, the advantage is that it gets done. In a digital environment, we are relying on these people who have interests and passion about doing these things to make it happen, and then we can try to figure out the technical details later. That's why we have so many digital libraries trying to deal with interoperability. I am hoping all of the projects that are developed by individuals that will consult with librarians who are working on digital library development to see how they can bring standards to bear early in the project so things are searchable later.

Doug Sery, MIT Press: There is this kind of general belief that we need to collect all the digital information that is being promulgated, and that's just a little bit ridiculous. We have satellites that are pumping out gigabytes of information into research repositories, and nothing gets done with it because no one knows what to do. The idea that either scientists or librarians involved in digital libraries can keep up with this is just not reasonable. We have to go back to what Ann Wolpert said about the idea of a library as a collection of services, so that librarians act as the filters for information, by classifying it and trying to preserve it for posterity.

Lynch: Let me challenge that statement. Human beings typing, speaking, and drawing are becoming increasingly low bandwidth. I would assert that when we look at words produced by human beings, we are already at the stage with the cost of storage technology that it is more expensive to decide what to keep than to keep it. You hear archivists saying "our job in the paper world is to figure the two percent of the records to keep." However, it's getting to be a lot more expensive to pick the two percent than to just keep all the records. Now organizing them is another matter. If you want anything other than full text search, that is a different issue. It is very important to differentiate that from high end sensor based data coming out of the sciences. Basically, one of the ways that you push science forward is to push the sensors, and then push the data collection. It always totters at the ragged edge of what you can afford to store. I think we are going to increasingly see very different issues between raw information and refined human output. It keeps getting more and more tractable to store, and more and more expensive to select what to use.

Marcum: I hear this argument a lot from computer scientists. We now have the technical capability to collect everything, and its cheaper to do that. But if we think about services to our users, then the question is one of what to pull out and make available in a meaningful way.

Lynch: That's a different issue. (laughter)

Marcum: That's really a big problem, and that's what libraries have traditionally done. I think we are not ready to say that we will just have these big platters of information somewhere.

Compiled by Mary Hopper

the digital library abstract speakers