Almost 15 years into the
Web revolution, digitization is changing our shelves and our catalogs. WorldCat.org, and the WorldCat Local pilot are moving the
catalog to the network level, and of course the Amazoogles afford competitive challenges and collaborative opportunities
that alter the fundamental value propositions of libraries.
The answer to What is the library? in the age of digital everything is unclear, but
certainly the answer must be rooted in collections. But what does it mean to be a collection in
an age where everything is available everywhere?
Libraries have always built and
maintained special collections to serve their clientele. What will this mean when most of what people
are looking for is online? The same as
ever it did… the SCOAP acronym for library functions still applies: Selection,
Collection, Organization, Access, and Preservation.
But how? Few libraries have the tools and training to collect and manage web archives as collections. Indeed, the tools themselves are rudimentary at this point in time, as is our notion of what should be collected. And beyond these top-level issues, the nuances of collection and management are daunting.
The Webarchivist.org research and software development group is a collaboration between the University of Washington and the SUNY Institute of Technology, motivated by the research needs of social scientists studying changes in Web content over time. The principals are not librarians (or archivists), but rather sophisticated patrons (researchers) working on solutions to problems they recognize from their own experience. They see the potential value of their tools to libraries though, and have worked with the Library of Congress on several projects, including the 2000 and 2002 elections, as well as a 9/11 archive.
The organizing notion is of a thematic collection of related Web content that may or may not be linked in the http-linking sense, but that is related by a theme (Web sites of senatorial candidates, for example, or the Web presence of the Tsunami of 2005, or the recent tragedy at Virginia Tech). Imagine a local history collection, or a major hybrid collection of research materials at a research library. The trick, then, is to provide tools that facilitate the selection of sites to be harvested, and processing tools that help in the organization, management, and interpretation of the data once harvested. Sounds like a library to me.
The WebArchivist folks (co-directors Kirsten Foot and Steve Schneider gave me my tour) recognize that other organizations are doing the Web vacuuming duties fine… their own value-add is a bespoke approach to up-front discovery (site selection) and providing tools for:
- re-presentation of content (‘pages’ as originally constituted may not be displayable as originally harvested)
- attaching metadata at a given level of granularity
- analyzing content changes over time
- navigational tools tailored to the given collection
The niche they fill is thus
to facilitate the creation and use of thematic collections from Web content
that can be of use to researchers and end users.
Providing the means for the creation of electronic thematic collections could
be of significant value to libraries, especially if embedded in systems that
simplify cataloging and management.
These tools are not ready for mainstream use at the local library at this time, but it is easy to see that they will be, and will provide another span of the bridge to our digital future.
Thanks to Gail Dykstra at the University of Washington, for bringing this project to my attention.
Image: Buddy on the move at