Thursday,
April 20, 2000
5:00 - 7:00 p.m.
Bartos Theater
MIT Media Lab
20 Ames Street
Abstract
Clifford
Lynch will address the following questions: What is a digital
library and what do such institutions have to do with "traditional"
or "brick-and-mortar" libraries? How do the changes in libraries
relate to developments in scholarly communication as this moves
into the digital environment, and how do issues involved in
archiving digital information fit in?
Deanna
Marcum will describe how three years ago, a small group of research
libraries (then 17, now numbering 24) decided to invest their
own resources in a loose federation to work on the "big problems"
of digital library development and asked the Council on Library
and Information Resources to host the activity. The Digital
Library Federation is now working in five areas: digital library
architectures, metadata, digital collections development, users
and user services, and digital preservation. She will describe
how this group of institutions came to a definition of "digital
libraries" and explain how that definition shapes the work of
the Federation. She will also speak about CLIR's role in disseminating
the results of DLF activities to the broader library and scholarly
communities.
Speakers
Clifford
A. Lynch is Executive Director of the Coalition for
Networked Information, an organization created to explore the
uses of information technology for the advancement of scholarly
communication and the enrichment of intellectual productivity.
Founded in 1990 by the Association of Research Libraries, Educom,
and CAUSE, CNI is supported by the members of an institutional
Task Force representing higher education, publishing, network
and telecommunications, information technology, and libraries
and library organizations. Before joining CNI, he was the former
head of libraries for the University of California system.
Marlene
Manoff is Associate Head of the MIT Humanities Library.
She has written about the politics of building library collections
and the impact of electronic technology on scholarly research.
Deanna
Marcum is president of the Council on Library and
Information Resources, a nonprofit organization formed by the
merger of the Commission on Preservation and Access and the
Council on Library Resources. In the past, she has served as
Director of Public Service and Collection Management at the
Library of Congress from 1993 to 1995 and Dean of the School
of Library and Information Science at the Catholic University
of America from 1989 to 1992. Marcum earned a Ph.D. in American
studies from the University of Maryland in 1991. In 1979, she
completed work on her master's degree in library science from
the University of Kentucky.
Ann
Wolpert has been Director of Libraries at MIT since
1996. Before joining MIT she served as Executive Director of
Library and Information Services at the Harvard Business School
from 1992-1996 after spending 16 years at Arthur D. Little,
Inc. She has a BA from Boston University and an MLS from Simmons
College.
Summary
[The text below is an edited summary,
not a complete transcript.]
Marlene
Manoff, MIT Humanities Library: As we were planning this
event, I looked up the summary of the last communications forum
devoted to library issues entitled The
University, the Library, and Information Technology. It
was held precisely five years ago, but I was struck by how little
the issues had changed. I particularly noted that Charles Vest
mentioned Project TULIP,
an attempt by the publisher Elsevier to partner with a number
of universities to provide full text access to about 40 journals
in materials science. Although it turned out to have been misguided
on a number of accounts, it was also one of many steps in our
growing understanding of what users want and the kind of infrastructure
we can support. Right now, many projects later and under the stewardship
of Ann Wolpert, the MIT Libraries are embarking on a new 1.8 million
dollar project with Hewlett-Packard called D-Space.
It will provide access to the work of MIT faculty authors. Given
the complexity of that project and digital issues in general,
we are fortunate to have with us two experts on these subjects.
Both of them are leaders of organizations that are helping us
contend with the issues raised by electronic technology.
Clifford
Lynch, Coalition of Networked Information (CNI):
Let me start with just the title of the session, "Digital
Libraries." This phrase is both a blessing and a curse. Much
like books, libraries are cultural totems. They wrap together
all sorts of feelings, social contracts, and roles. Talking
about the electronic book or the digital library has an ability
to simultaneously get people upset, inspired and concerned.
Even if the phrase is profoundly oxymoronic, we are reluctant
to discard it because it is so evocative. Let's look at various
views of what the term means as a way to explore some of the
controversy that is packed in it.
What
is a digital library?
Perhaps we can know
digital libraries by example. Over the past five years, the
National Science Foundation in conjunction with ARPA, NASA and
other government agencies, spent something on the order 30 million
dollars funding Digital
Libraries. Presumably, if we look to the products of those
grants, we will learn something about digital libraries. They
were discipline oriented information environments that had collections
of resources and tools for researchers and students in specific
fields. Very few of them involved the libraries at the institutions
to which the grants were given. Perhaps this is a functioning
definition of digital libraries. In the commercial space, there
are examples that line up well with this definition. For example,
services like Lexis/Nexis or Westlaw in the legal world are
reasonably comprehensive information spaces for attorneys and
law students. While it is hard to get as inspired about Westlaw
as we can by some of our amorphous ideas about dig ital libraries,
these places are where knowledge breeds, interbreeds, and grows.
These are some of the best examples we have of functioning digital
libraries.
Does this view of digital
library as information system incorporate active tools of knowledge
creation and refinement? Bill Wulf coined a term in the late
1980's called collaboratory.
"A collaboratory
is a "...'center without walls,' in which the nation's researchers
can perform their research without regard to geographical location
interacting with colleagues, accessing instrumentation, sharing
data and computational resources, [and] accessing information
in digital libraries."
That has a lot of elements
that resonate with the definition of digital library as information
system, but at the same time connect to the active pursuit of
science rather than the passive storage and review of scholarship.
There are systems that are at various places in the spectrum
between a purely passive information repository and a fully
realized collaboratory. As we move towards the active end of
the scale, a whole new set of skills and issues begin to show
up about structuring and refining knowledge.
What
do digital libraries have to do with "traditional" or "brick
and mortar" libraries?
The term library is
traditionally used for three things:
- a "collection of
stuff"
- a building or other
facility that houses a "collection of stuff"
- an organization that
typically encompasses the function of selecting and acquiring,
organizing, making available and insuring the long term preservation
and access to a "collection of stuff"
Another way of thinking
about digital libraries is seeing digital collections as a marker
in the evolutionary growth of existing libraries. Research libraries
are increasingly encompassing larger and larger digital collections,
which sometimes supplant and sometimes supplement parts of the
historical collections of print and other media.
There are well over
five thousand established scholarly and scientific journals
available in electronic form which research libraries are licensing.
In due course, they will let go of the print versions. Libraries
also have special collections of manuscripts, photographs, and
other one of a kind materials that are fragile. Historically,
access to them has been very privileged and place bound, but
now much of it is getting digitized and placed on the Internet.
This opens up a new complement to the traditional books and
journals to a much wider audience, and it is changing the way
that research in some areas is conducted. On the other hand,
there is a wide spread misimpression that almost everything
is in electronic form. This is not true. Very few books are
available in digital form. This is not a conspiracy of librarians,
paper lovers, architects and other groups to prevent everybody
from getting access to this wealth of electronic books. There
are some real problems about how we use books in digital f orm.
With journals, people skim online, decide what they need to
print, and then use paper as a user interface to actual articles
that they want to study. That works a lot better with a short
journal article than it does with a book.
What
issues arise as scholarly communication moves into the digital
environment?
As journals have moved
into a digital format, a tremendous loss of cohesion in the
scholarly literature has been lost. The material is scattered
across the publishers' websites, so you have to know who publishes
what. This is a huge challenge that no single library can take
on by itself. It calls for a collective approach by scholarly
publishers and research libraries to rebuild journals as a system
so that it is possible to do things like search and use active
citation across journals from different publishers.
We are in the worst
of both worlds at the moment with electronic journals being
published in both print and electronic form, thereby maximizing
expense for everyone involved. If you ask publishers why, the
reason they give is that there isn't a viable archiving strategy
to explain to readers, authors and the libraries that buy materials.
Since the process of preserving paper is known, they continue
to produce the paper, assuming that it will deal with the archiving
problem in the short run. This is a bad long run solution. There
are visions of scholarly communication that fully exploit the
electronic environment, but this dual publishing model presents
an enormous barrier to including interactive components such
as multimedia, databases, simulation models. Some of digital
material is growing up in new renegade genres of scholarly communication
outside the normal system of publication. We're going to have
a terrible time integrating them into the system of publication
until we resolve the issues of archiving digital information.
What
issues arise in archiving digital information?
The preservation and
archiving role that libraries have historically filled is going
to be more important in the digital world. We have learned a
fair amount about preserving digital information, but there
is still hard intellectual work left to do. We realize that
preserving media is a losing battle, so we preserve and manage
bits across ever evolving infrastructures. We know how to do
that, but it is expensive. Computer centers that manage large
amounts of data don't just give up on all the data and start
over because a new generation of storage device has come out.
While we understand archiving digial materials that share a
lot of the qualities of books or journal articles, we do not
understand what it means to archive a database or web site that
constantly changes. We don't know if we need to record the full
evolutionary development or when snap shots are adequate. This
is all part of the technical agenda of archiving.
We also face difficult
economic and legal challenges in archiving. Our system of copyright
law is inhospitable to our ability to manage our heritage in
digital form. Instead of digital material thriving under benign
neglect, it vanishes within a few years. A printed work can
sit on a shelf until it goes out of copyright as long as the
physical environment isn't too hostile. On the other hand, digital
information needs constant care during the 90 years when it
is under copyright and highly constrained legally. The legal
constraints prevent the maintinence of this material by groups
like libraries, unless they specifically negotiate for the rights
to do this
Again, there is tension
between centralization and distribution. It is not practical
for every library to preserve digital information. There are
great economies of scale to centralization. On the other hand,
I don't have the hubris to say we can get all the engineering
right to protect ourselves against all of the stupid things
that can happen, so I am not comfortable betting our intellectual
heritage on a single central archive. There are going to be
unforseen physical disasters, political meltdowns, funding fowl
ups, and personal lapses in judgment. We need more than one,
and less than a thousand. Those are some of the challenges that
we face in building libraries that truly can not just acquire
and offer access , but in deed preserve digital information
that is increasingly coming to represent our intellectual and
cultural heritage.
Deanna
Marcum, Council on Library and Information Resources (CLIR):
Three years ago, a group of research libraries (then 17, now
numbering 24) decided to invest their own resources in a loose
organization called the Digital Library Federation (DLF) to
work on the "big problems" of digital library development. They
asked the Council on Library and Information Resources (CLIR)
to host the activity.
Defining Digital
Libraries
We worked with the
institutions to develop the following broad working definition.
"Digital libraries
are organizations that provide the resources, including the
specialized staff, to select, structure, offer intellectual
access to, interpret, distribute, preserve the integrity of,
and ensure the persistence over time of collections of digital
works so that they are readily and economically available for
use by a defined community or set of communities."
It is hard to imagine
what doesn't fit in that definition. Our goal was to create
digital libraries that worked as well or better for the scholarly
community than the traditional libraries we know. Here is a
more elaborate description of what we meant by these terms.
Organizations that
Provide Resources: We recognized that digital libraries
would employ many different kinds of resources, and that these
resources need not be organized like a conventional library.
In fact, these resources need not even be in the context of
a conventional library. We recognize that there will be many
different resources. For example, there are many cottage industries
growing up that serve the need of scholars in specific disciplines.
The Los Alamos Physics Preprint Service is a good example of
a kind of digital library effort, and there are going to be
others based on disciplines over time.
Preserve the Integrity
of Collections, and Ensure the Persistence Over Time: In
1996, CLIR and the Research Libraries Group published the work
of a task force on the whole subject of preserving digital information.
That task force was made up of librarians, scholars, archivists,
lawyers, government officials, museum currators and preservation
specialists. In that report they laid out what would be required
to insure the integrity of digital objects and make sure that
they persisted over time. We took that report as the basis for
our work in the preservation of digital materials. We do recognize
that the integrity of digital objects and preservation are different.
They are very much connected, but they have to be treated separately.
Collections of Digital
Works: The 24 institutions involved in the DLF are large
research libraries that have taken collection development seriously
and manage huge stores of print materials. They wanted to make
sure that the same principles that guided their selection of
those print works would be equally important in the digital
environment. They also wanted to make sure that the users of
the materials were involved in making the selections for what
goes into the digital library. Over and over again, they came
back to the same question: how do we make sure that we have
integrated the digital resources into the traditional library
in a way that is most useful for the scholars, researchers and
students?
Readily and Economically
Available: Here's the part that is the hardest to talk about.
What does all of this cost? We wanted to make sure that we paid
close attention to the economics. Even though much of the digital
library work that has been done so far has been grant funded,
we know that won't last forever.
Use by a Defined
Community or Set of Communities: Finally, we stress it in
all of our projects, digital libraries are service organizations
as we define them. They pay attention to the needs of the user
community, and we realize that as we develop larger and larger
digital library projects, it is going to be very important to
look at the use made of these materials by users and then tie
what we put into the digital libraries to that.
Agenda
of the Digital Library Federation
We are now working
in five areas: digital library architectures, metadata, digital
collections development, users and user services, and digital
preservation. The agenda includes the following five key points:
Achieving Greater
Consistency, Predictability and Interoperability within Library
Science Communities: Each of the institutions in the DLF
will develop its own digital library, but the hope is that by
working on projects collaboratively, they will learn from one
another and it will accelerate the development in each of them.
Some of the most important work being done concerns how to get
greater consistency, predictability, and interoperability in
library service environments so that each of the institutions
will be available to others. For example, the architecture group
is working on looking at ways to think about meta data harvesting,
so that materials can be made available to search services.
They are working with the Open Archives Initiative, looking
at any of those ways that materials that were created by different
groups can be searched seamlessly by researchers, scholars and
students.
Collections Development:
The institutions in the DLF need to think about business models
for digital surrogates. Many of the them have fabulous special
collections in their traditional libraries that they want to
put into electronic form. These are one of a kind things, and
so there won't be an economy of scale. What are the business
models for digitizing those collections and making them available?
Are there certain processes we can adopt that will make the
process cheaper or more cost effective? They will also be reviewing
the guidelines for obtaining access to commercial and other
third party data sources.
Users, Use and User
Support Services: We are starting to work with several groups
who understand more about how we quantify use of digital collections
so that we know if these collections are effective for the purposes
they were intended to serve. At the same time, we are looking
at how the availability of digital resources changed the function
within the library. What does it do to reference service? What
does it do to public services generally?
Digital Preservation:
CLIR and DLF are considering the criteria for achivability of
electronic journals as a place to start. We have already met
with a group of librarians to consider some of the criteria
for archivability. We have a meeting scheduled with a group
of publishers to look at these criteria and help the publishers
think through what their interests are in this question of archiving
and a third group of licensing experts to begin to work out
the language that needs to appear in licenses that will ensure
that these electronic journals are preserved and made accessible
over time.
Roles and Responsibilities
of the 21st Century Research University Library: We mostly
work with research university libraries, and a number of the
directors are beginning to look at possible organizational models
of funding, then sharing what they learn about these models
with their colleagues.
It is important to
note that the 24 institutions are investing their own money
in this project. They realize that while there is some grant
money now for some of these digital library projects, that the
long term development will depend upon the creativity and innovation
of libraries today, so they are working together to try to accelerate
the pace of learning because they think it is so necessary.
RESPONSE
Ann
Wolpert, MIT Libraries: It is refreshing to know why
those of us who work in libraries feel tired all the time. We
are clearly being asked to do six impossible things before breakfast,
just like Alice in Wonderland. We work in a hostile legal environment
to perform impossible tasks with insufficient funds in a climate
of totally unrealistic expectations. Other than that, it is
a piece of cake.
There's a story about
a young faculty member at Oxford University that was counseled
by a senior member of his department to never to schedule a
meeting on a Wednesday, because it would ruin two weekends.
When I think about the phrase "digital library," I think it
ruins both "digital" and "library." From where we sit in real
libraries, digital information does have wonderful attributes
and functionality that can be complementary to the traditional
strengths of libraries. It appears to us that it may layer on
top of all the other forms of information that have emerged
over the course of time, and end up being one more thing that
is managed in libraries, rather than something that makes libraries
go away.
The question of the
durability of digital information is one that concerns us greatly,
and it is a key aspect of the digital library environment that
we hope to spend time on as part of our partnership with Hewlett
Packard. I heard a speech one time where the archivist of the
United States was talking about her favorite vase in the Metropolitan
Museum that had scribes around the outside, some of whom are
working on clay tablets and some of whom are working on papyrus.
She always figured what that was about was someone saying, "you
know, this papyrus isn't very permanent, so we better put it
on a clay tablet." There is some of this in the digital environment,
in that we are trying to make digitally formatted content behave
the way that other kinds of content behave. Its not a good fit,
because it isn't like other kinds of content. As Nicholas Negroponte
points out, there is a substantial difference between bits and
atoms.
Clifford defined libraries
as "collections of stuff." He knew I would rise to that. I have
always thought of libraries as a set of services. Those services
at MIT are dedicated to facilitating and supporting education
and research. How we go about that is very much a function of
what resources are available and what form those resources take.
We don't set out to "collect stuff." We set out to support education
and research, and are uniquely qualified by training, experience
and interest to do that. When someone reaches out their hand
to get a piece of information that is required to support education
or research, its our responsibility to have performed the services
that make it possible for them to have the resource be there.
Here at MIT, we have been dealing in multiple formats for a
long time, and are aware of the difficulties of integrating
those formats to facilitate and support education and research.
There is a lot of work
at MIT that involves thinking about large complex systems. What
happens in conversations about the digital library has to do
with the fact that people are thinking like blind men about
the elephant, each holding on to the piece of the problem that
they can touch. Those who produce information, such as a faculty
member who writes an article, see the piece of this large complex
system from the context of wanting to get this article published
in a well respected peer reviewed journal. They are not conscious
of the rest of the elephant. Those of us who work in preservation
are conscious of how on earth you keep these ears from getting
tattered and torn, and we are not so conscious of how information
gets into the system. When Clifford spoke about the importance
of preservation in rebuilding journals as a system, I thought
of how dauntingly difficult it is to think of that in the economic,
legal and socio political climate in which journals are created.
We need to understand this when we talk about the digital library.
Often, we are only talking about that piece that we can see
and understand, and we should never assume that that piece is
all there is to the problem. The production, publication, continuation
and preservation of information over time is indeed a large
complex system and there is a lot work to be done on every level
of that system.
DISCUSSION
Archiving New
Types of Digital Information
Felice
Frankel, Science Photographer, Edgerton Center, MIT:
I image science for all sorts of fascinating people at MIT,
and there is no question in my mind that imaging science is
going towards temporal capture. We are going to see more and
more animation, simulation, and 3-D modeling. Is anybody in
the library world addressing the fact that scientists are going
to be describing their data this way in terms of archiving.
Lynch: Things
like video have been the poor step children in libraries forever.
We are at the beginning of an enormous flood of video, sensor
and imaging data. The preservation of that is going to be important,
but it depends on formats settling down. A massive challenge
in this area is organization. We have a kind of interesting
problem with video, because typically you have the description
of an object, then you have navigational aids inside the object.
A book can have a citation describing the book, and then it
has an index at a greater level of detail inside of it. We don't
understand how to do this with video or more generally, continuous
media very well. We need algorithmic ways of doing it. There's
been some interesting work done in video segmentation and indexing
in a number of places. A lot of it that is still research. There
is some other work on the fringes that is is fascinating. People
are essentially doing non-destructive 3-D capture of objects,
including what's inside the objects. Then there's also some
people playing with systems that will refabricate the objects.
That's neat, but I'm not sure what we do with it. It is another
new genre that we don't understand well.
There is also still
a very open question about whether scientific data is ultimately
going to come under the stewardship of the traditional research
library, or fall under the direct stewardship of the scientific
community. If you look at what's happening at areas like molecular
biology, astronomy, or planetary science, most of that data
is not being managed by libraries. It is being managed by scientific
centers. That is not to say that information science expertise
doesn't have a role. You are seeing people on these teams who
are coming out of places that we used to call Library Schools
and now we call things like Schools of Information Studies or
Information Management and Systems. You see these new disciplines
showing up like biomedical informatics, which require some of
the same things, but they are skills rather than organizational
connections in many cases.
Wolpert:
In some ways, it is a real tribute to libraries that as new
technologies come along, people tend to automatically assume
that libraries will capture and save this content without any
understanding of how complex the technology is that underlies
it. Author C. Clarke said that any sufficiently advanced technology
is indistinguishable from magic. As we look at the pace at which
technological advance drives the creation and management of
information, we have new formats emerging at a breath taking
pace. People typically understand their applications, but they
have no idea what technology underlies it. You tend to think
the application is simple, because it appears simple to you,
whereas the technological underpinnings of that application
may be incredibly complex, difficult, and hard to manage. So
there is a real issue here around the preservation of materials,
because those who are producing content in these new formats
don't tend to know how complicated it is to carry these things
forward. At the National Archives, they have been struggling
with how to keep e-mail, which is just text based. They've decided
that you keep e-mail by printing it out on acid free paper.
One of my concerns
is that short sighted decisions will be made that have long
term implications. I recently had a conversation with the president
of the American Chemical Society, and he assured me that the
they could be trusted to keep chemical data information forever.
I pointed out to him that they price, sell and deliver that
information on a membership model. However, my mission as a
research librarian is to care about guaranteed access to that
material by people who aren't born yet, or who are two years
old and don't know they should be subscribing to the Journal
of the American Chemical Society. This emerging digital environment
stresses a lot of our assumptions about how information is created,
managed and sold. The definition of a library typically includes
that material is selected, organized and maintained for the
benefit of a defined community of users, and that information
is made available, not for sale, but for use. A lot of models
for preservation are sale based rather than use based.
Marcum: I think
we should just acknowledge that research libraries haven't by
and large cared so much about other kinds of materials. When
I was at the Library of Congress, I was in charge of all of
the special collections. The Director of Information Technology
kept saying to me, those are in the second phase. I finally
realized that the second phase was never going to come. On the
other hand, a lot of visual materials are of great importance
to commercial firms like Disney. I think you made an excellent
point, and I hope all librarians took note. I think we have
to take much more responsibility for these formats we haven't
paid that much attention to in the past.
Wolpert: But
its not simple, because the formats in which data are produced
change. Even at a place like MIT, when we put content up on
the network, we can not guarantee that faculty or students have
the equipment to read or use that, let alone find it. They may
not have the capacity for memory or a color screen. There are
all kinds of assumptions about the delivery of digital information
that are equipment based, whereas the beauty of the print environment
was that you didn't need to worry about whether you had the
specific equipment that was necessary to look at a work if it
was done on a medium which was predictable. Now we are moving
into an environment where we may be able to preserve content,
but we can't do anything about the predictability of your capacity
to look at and use that information. It is a complicated problem.
Defining
Project Success
Bill
Cattey, MIT Information Systems: I am an alumnus of
the TULIP project. I was one of the principle implementers and
architects. Now I am the project manager for the new D-Space
Project. In a certain sense, I agree with the overtones that
were presented about TULIP. It is looked upon as something that
we learned a lot from, but ultimately failed. So what are some
success criteria for this new project? Recognizing that we live
in the context of doing the impossible on insufficient resources
in a hostile environment, what are some ways that we can come
away from this new project feeling it is a success?
Lynch: I am
also an alumnus of the TULIP project. I would take issue with
whether it failed and argue that it underscores one of the very
hard things about doing research projects. The TULIP project
was a four or five year project that took place against immense
changes in information technology environments. When we started,
there was no web. When we finished, browsers had proliferated.
Due to the development cycles and the limited number of people
who could be put on development efforts, we made engineering
decisions prior to the emergence of the web which were hard
to change mid course. In terms of a project to gain insight
into how people used electronic journals and the costs and complexities
of mounting them, TULIP was a roaring success. We learned a
number of things not to do. If you look at the model of licensing
and implementing journals that evolved since, it is quite different,
in part because of the economic and operational experience gained
with TULIP. As a research project to generate data to guide
future decision making and system design, TULIP was tremendously
valuable and enormously successful. It didn't succeed as a project
to build a prototype system that neatly turned into a production
system. In the information services world, we tend to confuse
experiments, prototype services and beta releases of production
services. We assume that anything that doesn't turn into a production
service is a failure. This has led us to be very conservative
in our experimenting, and there's a lot we don't know because
we have been very timid and only do prototypes that will scale
or migrate into production systems.
It is also difficult
to get people to use experimental information systems. Most
people are busy and unwilling to invest time and behavior changes
in using something that is going to go away in six months. A
lot of projects are very perturbed in the first year or so by
novelty effects. For example, at the University of California,
back when databases were still fairly rare, we would release
an indexing and abstracting database maybe every eight months
to a year with a carefully calculated campaign of fanfair and
education. It would be two or three years before usage evened
out. In order to get good data, you've got to run a project
three, four or five years against a backdrop of enormously volatile
technological changes. Its no accident that many of the commercial
products now have life spans of a year or two and collect no
usage data as they go. That's not a bad strategy, but its a
poor strategy if you want to learn something.
The
Promise and Perils of Digital Libraries
David
Thorburn, Director, MIT Communications Forum: Many things
have surprised me in today's discourse. First, I am surprised
by the gloom in so many comments. We have remarkable technologies
that hold immense promise for libraries, but mostly today we've
heard a litany of woe. I would like to hear more about what
is exciting in these technologies. I am also surprised by the
relative silence concerning the traditional library. When I
consider the term "digital library," I don't immediately think
about all the new materials in digital format, but about the
extraordinary ways in which the traditional functions of libraries
may be enhanced or even transformed by digital systems. Images
from the collections in art galleries located in specific places
are now accessible around the world, the Oxford English Dictionary
is on line, computer-users anywhere on the globe can gain almost
instant access to the catalog of the Library of Congress and
many other great research institutions. And these wonders are
trivial in a sense, they're small matters compared to the astounding
promise of digitally-enabled methods of information storage,
access, search and retrieval.
Marcum: I couldn't
agree more that the great benefit of digital libraries has been
in increasing access. It has made it possible for a small college
library to provide access to things that students would never
have had a chance to see before. Its more deep concern rather
than gloom. Digital collections are fragile, and we worry that
the people who generate them don't realize that in two years,
unless someone pays attention to making them available, they
won't be. Our role in society is to be stewards of the intellectual
record, and we worry that we can't do that. For the first time
in our professional careers, we are dependent upon other people
to make these collections available, and we don't know that
we have their support. We don't have the connections made yet
to insure that these things last. Somebody said forever earlier.
I think forever means ten years now. Forever doesn't mean the
same thing anymore.
Lynch: I am
not a librarian, although I occasionally get cast as one. I'm
primarily a computer scientist -- a systems builder. I look
at these things with a great deal of optimism about how they
can enhance scholarship. The possibilities for creating and
disseminating knowledge are stunning, and the opportunities
for democratizing access to material are fabulous, but there
are caveats. I remember the excitement of when we first put
the catalogs from major universities online, and I also remember
the subsequent disappointment. People realized that they could
sit in Australia and search the Library of Congress to get a
marvelous lists of materials they couldn't get. Our legal and
economic system lags terribly far behind the promise of the
technology. It was incredibly frustrating that the network is
the ultimate tool of interlibrary loan technically, but isn't
viable legally.
Wolpert: I am
in the position of someone selling heroin in the school yard.
I spend an enormous amount of MIT's money on an annual basis
to license access to resources that would go away in a second
if I ever stop paying. People don't have a clue how fragile
that is, or how vulnerable we are collectively to the pricing
strategies of those who make that information available digitally
in a licensed environment that we don't own. Have you ever heard
of the Digital Millennium Copyright Act? It will enable those
who put this licensed information on the network to encrypt
it, and it will be a felony to get around the encryption if
you have reason to use that material that qualifies as fair
use. As someone who is responsible for paying the dollars associated
with this new truly magical environment, I am profoundly worried.
We are way out there in terms of what's possible technically.
Most of us who work in libraries have been there and done that,
and now we are worried about what the future holds because of
our concern about commercial interests imagining extraordinary
profits from a monopoly environment.
Lynch: I had
the good fortune to serve on a National Research Council committee
looking at these issues. They issued a report through the National
Academy called The
Digital Dilemma: Intellectual Property in the Information Infrastructure.
One of the most troublesome things that's happened legally is
the Sonny Bono Copyright Extension Act (hisses from the audience).
Due to pressure from some large media companies and recognizing
that certain critical things like the early Mickey Mouse films
were about to pass out of copyright, Congress quietly extended
the term of copyright for 25 years with very little discussion.
The problem is that there is this vast fallow field of books
that have been out of print for fifty or more years which are
hard to find but are useful for scholarship, but we've declared
a 25 year moratorium on moving them to the Internet. Now, in
order to make use of this, you have to go through a phenomenally
costly process of tracking down the heirs of some obscure author
from thirty years ago to clear rights for this thing. That will
cost you far more than the cost of digitizing it or publishing
it digitally. I see tremendous contradictions between the potential
of the digital technology and the social things we're wrapping
around it.
Preserving
the History of Digital Media
Henry
Jenkins, Director of Comparative Media Studies: What
artifacts of the digital age should our libraries be collecting?
A a historian of media, if I wanted to do the history of the
web today, I would have more luck doing cinema at the turn of
the century or even live theater in the nineteenth century.
Many of the sites that were up during the first year or two
of the web are now gone and will never be recoverable. The same
would be true of digital cinema which is only two or three years
old. I also just read a news story about the problems of getting
access to restored games in a way that they can be played and
experienced in something like their original form. Libraries
historically collected popular fiction and other materials that
would seem to be comparable in their social impact to games,
but by and large libraries have not gotten into the question
of collecting them. How can we collect and preserve the forms
of storytelling and entertainment that are emerging in the digital
age, and keep them accessible?
Marcum: When
you look at the digital library, only a few of them kept popular
fiction. There were some comprehensive libraries and some highly
specialized libraries that did. Very few research libraries
kept dime novels, and that's why we go to Minnesota to see the
good collection of them. We have to work with people who are
developing the archives in special areas, so that they bring
their materials into an archivable repository that can be kept.
Manoff: There
is a misconception that libraries have collected everything.
That has never been the case. Libraries only collect a very
small percentage of the total output, and even something like
popular fiction or pulp novels have only been collected primarily
after they became of academic interest. We follow what's going
on in our programs, and if faculty need these things, then we
figure out how to store them. We are always a little behind.
Marcum: Unfortunately,
in the digital world, we can't wait. In the paper world, we
could wait. We have to be more active in identifying where work
is being done and harvest it, because we won't have a chance
otherwise.
Manoff: We have
lost so much of what was produced in paper. It is not as if
we have it all. It is much harder to store things in electronic
format. Half of the problem is knowing what it is that we want
to preserve.
Jenkins: Your
story about the Library of Congress printing out e-mail on acid
proof paper comes to mind. Most of the early films of the twentieth
century survived in paper prints rather than in celluloid because
of the copyright policy of that period. By the time libraries
got around to collecting cinema, we had blown up most of the
existing nitrate prints, and we lost the bulk of the films made
in the first 20 years of our history of the medium. If it hadn't
been for the weird nesses of copyright law that led to the paper
print collection at the Library of Congress, we might have nothing.
Marcum: Only
about half of the films that have been made in this country
are still extant.
Thorburn: A
much smaller number of television programs or sound recordings
end up being preserved. So we are talking about pre-digital
formats that need protecting. One implication of what you're
saying is that, if we are sensible, we need to work quickly
to identify certain groups or consortia in different specialties
to be responsible for aspects of the emerging digital environment,
rather than waiting for every library to take on the full job,
since that probably would never happen.
Wolpert: It
is impossible for all the libraries to do all the things. I
hope that they will take responsibility for certain portions
of the scholarly record, so then we have half a chance of having
it. Tip O'Neil used to say that all politics is local. Historically,
all archiving has been local. Things have been archived because
there was sufficient institutional commitment to put resources
behind the archiving of a discipline, a material type or a form.
Research libraries today find themselves in a strange position
between the proverbial rock and the hard place. For at least
a decade, it has been assumed that everything would be digital
and free, but nothing could be further from the truth. People
are beginning to understand that the costs of working in the
digital environment are substantially higher than working in
the print environment. In the print world, you just turn on
the lights. In the digital world, you need the appropriate platform
to look at the material, and platforms don't necessarily migrate
in a way that makes it possible for a successive generation
of hardware and software to use what preceded it. The costs
implications associated with archiving material as it comes
out in new formats are typically not visible to those who want
it. We want to do this, but where do we find the resources to
do it? If it is a zero sum game, what do you gain and give up
in order to do it?
Manoff: Another
reason that this sounds so gloomy may be this context. Every
time I have sat here in the Bartos Theater at the Media Lab,
I have listened to people talk about some wonderful new technology
and how terrific our future would be. This is definitely an
odd program out for this particular space. I can understand
what David Thorburn said about the gloom.
Wolpert: This
panel is what happens when the Media Lab's ideas come home to
roost.
The
Role of Librarians in Digital Libraries
Arne
Hessenbruch, Dibner Institute: The Dibner Institute
has just been given a grant by the Sloan Foundation to put the
history of twentieth century science on the Internet. I work
with that, so I guess I'm in a cottage industry. I was very
taken by this concept. What are cottage industries, and what
will be the problems of merging them in the future?
Marcum: Most
of the cottage industries I am familiar with are in the sciences,
although there are a few in the humanities. Many of them started
with a single advocate who cared about trying to archive all
of the material in that field, and it usually grew with external
funding to became a formal archive. The difficulty is that the
way the person did things has often been institutionalized,
so it is hard to put them together. However, the advantage is
that it gets done. In a digital environment, we are relying
on these people who have interests and passion about doing these
things to make it happen, and then we can try to figure out
the technical details later. That's why we have so many digital
libraries trying to deal with interoperability. I am hoping
all of the projects that are developed by individuals that will
consult with librarians who are working on digital library development
to see how they can bring standards to bear early in the project
so things are searchable later.
Doug
Sery, MIT Press: There is this kind of general belief that
we need to collect all the digital information that is being
promulgated, and that's just a little bit ridiculous. We have
satellites that are pumping out gigabytes of information into
research repositories, and nothing gets done with it because
no one knows what to do. The idea that either scientists or
librarians involved in digital libraries can keep up with this
is just not reasonable. We have to go back to what Ann Wolpert
said about the idea of a library as a collection of services,
so that librarians act as the filters for information, by classifying
it and trying to preserve it for posterity.
Lynch: Let me
challenge that statement. Human beings typing, speaking, and
drawing are becoming increasingly low bandwidth. I would assert
that when we look at words produced by human beings, we are
already at the stage with the cost of storage technology that
it is more expensive to decide what to keep than to keep it.
You hear archivists saying "our job in the paper world is to
figure the two percent of the records to keep." However, it's
getting to be a lot more expensive to pick the two percent than
to just keep all the records. Now organizing them is another
matter. If you want anything other than full text search, that
is a different issue. It is very important to differentiate
that from high end sensor based data coming out of the sciences.
Basically, one of the ways that you push science forward is
to push the sensors, and then push the data collection. It always
totters at the ragged edge of what you can afford to store.
I think we are going to increasingly see very different issues
between raw information and refined human output. It keeps getting
more and more tractable to store, and more and more expensive
to select what to use.
Marcum: I hear
this argument a lot from computer scientists. We now have the
technical capability to collect everything, and its cheaper
to do that. But if we think about services to our users, then
the question is one of what to pull out and make available in
a meaningful way.
Lynch: That's
a different issue. (laughter)
Marcum: That's
really a big problem, and that's what libraries have traditionally
done. I think we are not ready to say that we will just have
these big platters of information somewhere.