My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    January 09, 2007

    LOCKSS - Floating electronic librarianship to a higher level

    Img_6461 Victoria Reich, of Stanford University, spoke about LOCKSS and CLOCKSS at OCLC this morning, following her colleague Mike Keller’s talk yesterday.   LOCKSS stands for Lots Of Copies Keeps Stuff Safe (though, it might be temporarily rewritten as Lots Of Coherent Knowledge from Stanford Speakers, given the excellent presentations we’ve enjoyed here in Dublin this week).

    Many readers will have heard of these systems, especially LOCKSS, which has nearly a decade of development effort behind it.  If I were to characterize Mike Keller’s perspective as how we make libraries more Googlish, Vicky’s perspective is solidly in the realm of how to make the Web behave more like library shelves.  This may strike some as ho-hummish, but it is arguably among the most important missing links in making digital libraries solid enough to bear up as reliable stores of cultural assets.

    Vicky builds her case on two core assumptions:

    1. Democracies and libraries and are inextricably linked
    2. Maintaining collections with design and intent distinguishes libraries from book warehouses

    She amplified these assumptions by describing the attributes of print libraries that make them particularly suitable as reliable preservation systems:

    • Resistance to Attack - replication of assets means they are less susceptible to:
      • Natural disasters 
      • Ideological extremists
      • Vagaries of governmental policy
    • Self Healing - collaboration and resource sharing make access to print tractable (if perhaps a tad slow):
      • Interlibrary loan
      • Preservation copies

    Vicky argues that the Web changed the relationship of libraries and publishers, and (at least potentially) compromised all of these attributes in one way or another, thereby weakening the link between libraries and the societies they support, and compromising the coherence of managed collections. LOCKSS is intended to help redress these issues.

    She poses two key questions about electronic library data:

    1. Who has custody?
    2. Who gets access?

    If it sounds a bit like divorce court… it IS a bit like divorce court.  The revenue stream belongs to the content owners, but the custody part resides with the people with the sensible shoes -- us. In the case of LOCKSS, this means a community governance model and library custody of data... but with constraints.  One important aspect of the model is that data previously 'purchased' remains accessible to members of the community that purchased it. That is, stuff paid for in the past, but for which current subscriptions are not in force, remains accessible.

    This point is quite important, as it returns management of the collection back to the library.  Publishers may of course feel differently about this. Some endorse the model, others view their data as a time-sensitive licensed-commodity -- the electronic equivalent of taking materials off your shelf when you decide you don't want to (or can't) sustain the subscription (my metaphor, not Vicky's).

    Which brings up a key issue in the LOCKSS model: Publishers must agree.  Agreement is simple to effect: slap a LOCKSS-friendly consent statement on the Web page (data ingest is by Web-crawler, so everything in this system happens pretty much automagically).  But not all publishers see through these same rose-colored glasses.  An earlier version of this post misquoted the number of titles  currently under LOCKSS management, but it is substantial and growing (thanks to Vicky for catching my error).  It will be interesting to see how it grows as a proportion of total ejournal usage .

    LOCKSS also addresses the problem of format migration by supporting HTTP content negotiation. This too-little used feature of the HTTP protocol allows clients and servers to have a conversation about the  formats in which data can be provided and accepted.  LOCKSS clients use this information to do a best-match on-the-fly re-formatting of content so an obsolete format can be rendered as close as possible to the original, and without intervention by the reader. While I am skeptical as to how this holds up in a generalized way, it may the best way to address the problem, and will probably work reasonably well for common formats that can be expected to comprise the bulk of formally-published materials.

    Vicky touched on a wide array of related issue that I cannot cover in one post, including CLOCKSS, a Controlled LOCKSS that differs very little in the underlying technology, but which is managed according to different policies that will be more appealing to a broad range of publishers. Another exciting prospect is the use of LOCKSS as a low-cost means of preserving semi-published, often poorly-curated content that is unencumbered by the constraints of publishing business models. And it fits particularly nicely into the role of libraries as managers of unique collections.  Blogs and institutional repositories are on their radar screen, and one can imagine other interesting possibilities. 

    This is terrific work that evokes a higher professional calling than simply being handmaidens to the Amazoogles.  This is technology that is carefully thought out, low in cost, high in impact, and which can be wrapped in policies that surface the professional standards and culture of collaboration that define the library community.

    -----

    Image: Son, brother, wife & mother, coerced to pose in the Short North district in Columbus, around Thanksgiving, 2006