My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    January 09, 2007

    LOCKSS - Floating electronic librarianship to a higher level

    Img_6461 Victoria Reich, of Stanford University, spoke about LOCKSS and CLOCKSS at OCLC this morning, following her colleague Mike Keller’s talk yesterday.   LOCKSS stands for Lots Of Copies Keeps Stuff Safe (though, it might be temporarily rewritten as Lots Of Coherent Knowledge from Stanford Speakers, given the excellent presentations we’ve enjoyed here in Dublin this week).

    Many readers will have heard of these systems, especially LOCKSS, which has nearly a decade of development effort behind it.  If I were to characterize Mike Keller’s perspective as how we make libraries more Googlish, Vicky’s perspective is solidly in the realm of how to make the Web behave more like library shelves.  This may strike some as ho-hummish, but it is arguably among the most important missing links in making digital libraries solid enough to bear up as reliable stores of cultural assets.

    Vicky builds her case on two core assumptions:

    1. Democracies and libraries and are inextricably linked
    2. Maintaining collections with design and intent distinguishes libraries from book warehouses

    She amplified these assumptions by describing the attributes of print libraries that make them particularly suitable as reliable preservation systems:

    • Resistance to Attack - replication of assets means they are less susceptible to:
      • Natural disasters 
      • Ideological extremists
      • Vagaries of governmental policy
    • Self Healing - collaboration and resource sharing make access to print tractable (if perhaps a tad slow):
      • Interlibrary loan
      • Preservation copies

    Vicky argues that the Web changed the relationship of libraries and publishers, and (at least potentially) compromised all of these attributes in one way or another, thereby weakening the link between libraries and the societies they support, and compromising the coherence of managed collections. LOCKSS is intended to help redress these issues.

    She poses two key questions about electronic library data:

    1. Who has custody?
    2. Who gets access?

    If it sounds a bit like divorce court… it IS a bit like divorce court.  The revenue stream belongs to the content owners, but the custody part resides with the people with the sensible shoes -- us. In the case of LOCKSS, this means a community governance model and library custody of data... but with constraints.  One important aspect of the model is that data previously 'purchased' remains accessible to members of the community that purchased it. That is, stuff paid for in the past, but for which current subscriptions are not in force, remains accessible.

    This point is quite important, as it returns management of the collection back to the library.  Publishers may of course feel differently about this. Some endorse the model, others view their data as a time-sensitive licensed-commodity -- the electronic equivalent of taking materials off your shelf when you decide you don't want to (or can't) sustain the subscription (my metaphor, not Vicky's).

    Which brings up a key issue in the LOCKSS model: Publishers must agree.  Agreement is simple to effect: slap a LOCKSS-friendly consent statement on the Web page (data ingest is by Web-crawler, so everything in this system happens pretty much automagically).  But not all publishers see through these same rose-colored glasses.  An earlier version of this post misquoted the number of titles  currently under LOCKSS management, but it is substantial and growing (thanks to Vicky for catching my error).  It will be interesting to see how it grows as a proportion of total ejournal usage .

    LOCKSS also addresses the problem of format migration by supporting HTTP content negotiation. This too-little used feature of the HTTP protocol allows clients and servers to have a conversation about the  formats in which data can be provided and accepted.  LOCKSS clients use this information to do a best-match on-the-fly re-formatting of content so an obsolete format can be rendered as close as possible to the original, and without intervention by the reader. While I am skeptical as to how this holds up in a generalized way, it may the best way to address the problem, and will probably work reasonably well for common formats that can be expected to comprise the bulk of formally-published materials.

    Vicky touched on a wide array of related issue that I cannot cover in one post, including CLOCKSS, a Controlled LOCKSS that differs very little in the underlying technology, but which is managed according to different policies that will be more appealing to a broad range of publishers. Another exciting prospect is the use of LOCKSS as a low-cost means of preserving semi-published, often poorly-curated content that is unencumbered by the constraints of publishing business models. And it fits particularly nicely into the role of libraries as managers of unique collections.  Blogs and institutional repositories are on their radar screen, and one can imagine other interesting possibilities. 

    This is terrific work that evokes a higher professional calling than simply being handmaidens to the Amazoogles.  This is technology that is carefully thought out, low in cost, high in impact, and which can be wrapped in policies that surface the professional standards and culture of collaboration that define the library community.

    -----

    Image: Son, brother, wife & mother, coerced to pose in the Short North district in Columbus, around Thanksgiving, 2006

    January 08, 2007

    Cows and the Colossus

    Img_6511 Mike Keller delivered a presentation at OCLC today entitled Mass Digitization in Google Book Search: Effects on Scholarship. Mike is director of Stanford University Libraries, and wears an academic publisher’s hat as well, being responsible for High Wire Press and Stanford University Press. He commands a panoramic view of the digital scholarly landscape, and has the intellect and experience to convert view to vision. This vision is both breathtaking and, in some respects, disturbing.

    For those unsettled by the rapidity of Googlian hegemony in library spaces, Mike constructs a vivid and compelling argument for embracing the revolution, reaching back to the digitization of card catalogs. Several salient observations from his remarks:

    • Digitization of the card catalog resulted in a 50 % increase in book usage
    • Google indexing is the #1 driver of article usage in High Wire – by a large margin (10 to 1 beyond the next highest, if I understood him correctly)
    • Metadata searching (what Keller describes as subtle searching), in combination with novel methods of taxonomic search and citation cross-linking, dramatically improves discovery and navigation within large result sets.

    It is difficult to resist Keller’s assertion that Google Book Search (GBS) is likely to revolutionize access to books more than any single factor in the library world – if not directly, then indirectly. It would be hard to be a librarian and not find chagrin in this realization.  Keller rightly urges us to focus on the larger picture and the many benefits.

    Stepping away from the somewhat daunting implications for libraries, Keller suggested that the most important thing about GBS is that it has occasioned a great debate about the importance of copyright in the intellectual life of the nation (and the world). The legal wrangling surrounding the digitization of books in the so called Google Libraries may indeed midwife a re-examination of copyright issues on a grand scale. Certainly the current interpretation of copyright law in the United States is far more likely to be driven by supply-chain economics than by the intent of the framers of the Constitution. Perhaps for the first time, there are heavy hitters on both sides of the argument, which may result in a reinterpretation of fair use that makes more sense (to libraries and readers) in the digital age.  One may hope.

    Keller pointed out the importance of healthy competition among various digitization projects: the Million Book Project, GBS, the Open Content Alliance, the Microsoft/British Library and Microsoft/Cornell efforts. Could we have imagined anything like this rush to mine the library shelves of the world even a few years ago? Could we (the library community) have marshalled either the vision or the resources to accomplish the task on our own?  It is unlikely.

    On the dark side, he raised the image of libraries as herds of cows in these deals. Participants are kept in the dark, enjoined from sharing the details of their deals with other participants, let alone with their public constituencies. They (we) don’t know if they are being “milked, butchered, or destined to cover automobile seats of expensive cars.”  [I paraphrase, but only slightly]

    What is evident is that benefits for the G-Libraries are substantial. The libraries involved receive a windfall of the digitized contents of their collections (though, Keller also points out that much is likely to have to be recaptured at higher resolution in the future).  For institutions of the caliber of these early G-partners, their technical wherewithal and innovative approaches promise rich research environments as well as new services to readers.

    There are plenty of objections, of course, some with a Luddite feel, others based on money, control, cultural imperialism, disintermediation of libraries, and the hegemony of a colossus too large and too innovative to ignore. Publishers, authors, and librarians all have much to lose, but also many potential gains in how this plays out in the long run. It would seem that only for readers is the benefit clear and unambiguous. But even readers may lose in a world where any monolith controls access to the content, whether that monolith be a government or Google.

    Mike Keller is an articulate spokesperson for the dilemmas and opportunities that are upon us. I found myself unable to resist most of his arguments, even a few I wanted to reject. Our community (or at least an elite component of it) has been offered a deal impossible to resist, a side effect of the inexorable march of digitalization that began in our realm with Henriette Avram’s vision of machine readable cataloging.

    Most of our constituents will benefit greatly, while libraries will rattle from the footfalls of the Googles and their ilk. Still, I am unnerved that these developments are cloaked in non-disclosure agreements, even as I am excited about the extraordinary challenges and possibilities we face.  Will cooperation within the community survive the Faustian bargains from without?

    -----

    My thoughts here are almost certainly flawed renditions of Mike's remarks... I was jotting notes as fast as my dyslexic fingers would go, and trying to pay attention as well.  Responsibility for distortions or errors are of course my own.
    -----
    Image: OCLC at dawn

    August 22, 2006

    Push, Pull, and Self-censorship

    Lake_union_houseboat

    Librarianship has for some time been immersed in a conversation about relevance. What value do libraries embody in the age of the Amazoogles, cell phones with computational power rivaling early space vehicles, and home computers with a couple terabytes of storage

    Libraries have traditionally been a pull technology… you have to go to a physical place, find the item you want (generally known beforehand), and pull it off the shelf.

    Increasingly we live in a push information world. TV pushes their take on the news at us, RSS feeds push the headline news and what has changed on the Internet… for places we’ve pre-selected, at least. 

    The following, from the DemocracyNow.org website, is a chilling commentary on what the corporate news industry is pushing to the American public:

    TV Networks Focus on JonBenet Ramsey Case Over NSA Ruling
    The major court ruling on the National Security Agency surveillance program has received scant coverage from the nation’s three major networks. On Thursday, ABC, CBS and NBC all led their nightly broadcasts with the latest in the 1996 murder case of child beauty queen JonBenet Ramsey. ABC devoted twice as much time in its broadcast to Ramsey as it did to the NSA story. CBS offered seven times as much airtime to Ramsey as it did to the NSA story. And NBC devoted 15 times more airtime to Ramsey.

    It is hard to know whether this is self-censorship on the part of Big-News, or simply venal sensationalism (or both), but it is discouraging, and accentuates the importance of independent sources of information in the future of democracy. 

    I’m not sure how libraries can effectively compete for the attention of television viewers, but as traditional sources of news become ever more co-opted by the infotainment industry, the role of providing a neutral spectrum of information is increasingly important.  System vendors are making it easier for libraries to set up RSS feeds to alert patrons to newly acquired assets and community events, and many libraries use them.  And Worldcat.org now provides web-access to records for 1.3 billion items in 10,000 libraries free of charge. It ain’t book-TV, but libraries at least have a beachhead where other sectors of the information industry have ceased to take their patrons seriously. The service model is changing (the role of disclosure becomes paramount: see Lorcan's notes on the Search, Share, and Subscribe, for example), but the mission is still pretty much the same, and needed more than ever.

    -----
    Image: Lake Union houseboat, Seattle