My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    February 18, 2008

    RESTful Repositories?

    Melbourne_maze_6954 Andy Powell has made notable contributions to best practices concerning persistent identifiers for quite a long time.  I have always found his recommendations practical and free of ego.  I almost said free of ideology, but of course we all suffuse our musings with the aggregate of experience and beliefs which round to ideology.  The ideology of others tends to disappear to the extent that it matches our own.  Protective coloring rears its barely-discernible head.

    Andy's opening keynote at the VALA2008 conference in Melbourne a fortnight ago exposed a theme of his ideology which serves the community well in the domain of persistent identifiers, and which he brings to bear on the evolution of repositories.  I paraphrase:

    Deviation from mainstream Web idioms reduces uptake and quenches the natural interconnectivity which underlies the richness of the Web.

    Andy's own summary of the issues includes the following:

    our current preoccupation with the building and filling of 'repositories' (particularly 'institutional repositories') rather than the act of surfacing scholarly material on the Web means that we are focusing on the means rather than the end (open access). Worse, we are doing so using language that is not intuitive to the very scholars whose practice we want to influence.

    One way to think about repositories is as the bookshelves of the digital library.  They are designed to impose order and facilitate management of content.  We don't ask scholars, having just published an article or book, to 'go to the library to find the most appropriate place for it... and don't come back until you do!'  Not a perfect analogy, but it speaks to the issue of mandating overhead to authors in order that their work is fixed in the scaffold of their discipline's knowledge stores.  Still, we have bookshelves for a reason, and something like them is necessary to support the management of digital assets as well.  It is hard enough to look after digital resources in a persistent way.  Current repository technology is not yet mature, for sure, but it isn't the case that we don't need what it is trying to deliver. (I don't think Andy would disagree with this).

    Andy goes on to say:

    our focus on the 'institution' as the home of repository services is not aligned with the social networks used by scholars...  As a result, we resort to mandates and other forms of coercion in recognition that we have not, so far, built services that people actually want to use. We have promoted the needs of institutions over the needs of individuals.

    Well, yes, but it isn't as though we don't see this all the time.  It is a rare case when the institutions I have administrative dealings with tailor their procedures and requirements to my needs.  Instead, procedures are designed to increase management efficiency, often at the time-expense of individuals.  I myself have been known to whine about just such impositions (duh), but presumably, the gains in efficiencies of such requirements redound to the general benefit of all.  Thats the theory, anyway.  Sometimes its even true.

    The question of where the natural home for repository functionality might be is tricky.  Lorcan Dempsey refers to institutional reputation management -- a natural and important piece of the puzzle.  Publishers are loathe to lose control of the content, but their time is passing... open access is simply too compelling a juggernaut to be resisted.  OA is a when question, not an if question. Professional societies want to play, and some of them sit, somewhat uncomfortably, astride the roles of domain advocacy and commercial publishing. Witness the American Chemical Society doing the splits as the open data boat slowly slides away from the commercial asset management pier.

    It is still possible that another entirely different model will emerge... more in-the-cloud.  A distributed model does seem to complicate curation, (and that institutional reputation thing), but I wouldn't count it out just yet.  Still, some institution has to take care of this stuff... responsibility involves the attachement to artifacts, even if they are bitstreams.

    Andy goes on to appeal for more RESTful architectural design, and in this I think he is dead on the mark:

    the 'service oriented' approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU and OpenURL sit uncomfortably with the 'resource oriented' approach of the Web architecture and the Semantic Web.  We need to recognise the importance of REST as an architectural style and adopt a 'resource oriented' approach at the technical level when building services.

    There are some details of Andy's perspective that I'm happy to contend, but as usual he forces our attention back to design for the Web, of the Web, by the Web.  Sounds pretty much right to me. An excellent keynote to start off VALA 2008.

    Post Script: Speaking of architectures, I see that Roy Fielding, the prime progenitor of all things RESTful, has tired of endlessly explaining the same things on list-after-list, and has started a blog.  This is welcome news indeed.  And you gotta love his colorful domain name.
    -----

    Public sculpture along the Southbank Promenade of the Yarra River in Melbourne

     

    June 19, 2007

    Facebook, Academia, and Intellectual Property

    Whetstonegazebo When you buy a new car, you suddenly notice how many of them there are on the road, and Facebooking feels a lot like that... seems like everyone is there, and the people who aren't will probably show up tomorrow.  Andy Powell points to a recent article detailing FB's unprecedented growth, and certainly this growth is manifest among my colleagues.

    Andy also has been writing about FB as a tool for scholars, and drawing comparisons with the faceless stagnation that characterizes institutional repositories at this stage of our journey into the new tools of scholarship. I think he is very close to the mark.

    One of my Facebook contacts, Jennifer Lang, posted an interesting link from Blogscholar.com about the darker side of the FB business model:

    Academia and the dangers of Facebook reads in part:

    Facebook is a very ingenious business model with the capture of a global network of IP at its heart.  For reference to this see the almost unfathomably bold terms on the site regarding posted user content ... "By posting User Content to any part of the Site, you automatically grant, and you represent and warrant that you have the right to grant, to the Company an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to use, copy, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part) and distribute such User Content for any purpose on or in connection with the Site or the promotion thereof, to prepare derivative works of, or incorporate into other works, such User Content, and to grant and authorize sublicenses of the foregoing."

    I think what this really means is you can't tell them to take something off because you changed your mind.  It also means you might want to think twice before publishing chapters of your book there.  I'm not sure that I have a problem with this sort of license, given that I mostly WANT others to see and reuse what I talk about publicly, but on the other hand, I do want attribution for my thoughts and ideas, and there seems to be no assurance of any such thing in this boilerplate... quite the opposite.

    Still, I don't expect to lose much sleep over this... nor do I expect a scholarly publishing environment to blossom from within a technology born of college hookup aspirations.  The primary value we hope to reap from social networking services isn't really the content at all, but rather the emergent relationships among content objects (and the entertainment value of the asides of our extended cohort).  The content can live happily (and safely) elsewhere, including in institutional repositories (and blogs!).

    When I tab to my Facebook page, what draws me is the feed... the changes in the status of my colleagues (twitteresque status notes), but more importantly, links to important ideas or discussions...and why not repository objects?  Guided serendipity that helps me understand the ever-changing state of mind of my community. 

    Ideas of importance have a fixity that is itself important, and to which our community pays great tribute. Social networks are the antithesis of fixity: fluid, capricious, whimsical, spontaneous, emergently creative (ok, and tedious, self-absorbed, and noisy).   Still, I want to live in the amalgamation of this sort of Yin and Yang.  Facebook is unlikely to realize this fully, for lots of reasons, but it feels to me that it is the right direction.

    -----

    The Gazebo at the Whetstone Park of Roses, Columbus, Ohio

    January 09, 2007

    LOCKSS - Floating electronic librarianship to a higher level

    Img_6461 Victoria Reich, of Stanford University, spoke about LOCKSS and CLOCKSS at OCLC this morning, following her colleague Mike Keller’s talk yesterday.   LOCKSS stands for Lots Of Copies Keeps Stuff Safe (though, it might be temporarily rewritten as Lots Of Coherent Knowledge from Stanford Speakers, given the excellent presentations we’ve enjoyed here in Dublin this week).

    Many readers will have heard of these systems, especially LOCKSS, which has nearly a decade of development effort behind it.  If I were to characterize Mike Keller’s perspective as how we make libraries more Googlish, Vicky’s perspective is solidly in the realm of how to make the Web behave more like library shelves.  This may strike some as ho-hummish, but it is arguably among the most important missing links in making digital libraries solid enough to bear up as reliable stores of cultural assets.

    Vicky builds her case on two core assumptions:

    1. Democracies and libraries and are inextricably linked
    2. Maintaining collections with design and intent distinguishes libraries from book warehouses

    She amplified these assumptions by describing the attributes of print libraries that make them particularly suitable as reliable preservation systems:

    • Resistance to Attack - replication of assets means they are less susceptible to:
      • Natural disasters 
      • Ideological extremists
      • Vagaries of governmental policy
    • Self Healing - collaboration and resource sharing make access to print tractable (if perhaps a tad slow):
      • Interlibrary loan
      • Preservation copies

    Vicky argues that the Web changed the relationship of libraries and publishers, and (at least potentially) compromised all of these attributes in one way or another, thereby weakening the link between libraries and the societies they support, and compromising the coherence of managed collections. LOCKSS is intended to help redress these issues.

    She poses two key questions about electronic library data:

    1. Who has custody?
    2. Who gets access?

    If it sounds a bit like divorce court… it IS a bit like divorce court.  The revenue stream belongs to the content owners, but the custody part resides with the people with the sensible shoes -- us. In the case of LOCKSS, this means a community governance model and library custody of data... but with constraints.  One important aspect of the model is that data previously 'purchased' remains accessible to members of the community that purchased it. That is, stuff paid for in the past, but for which current subscriptions are not in force, remains accessible.

    This point is quite important, as it returns management of the collection back to the library.  Publishers may of course feel differently about this. Some endorse the model, others view their data as a time-sensitive licensed-commodity -- the electronic equivalent of taking materials off your shelf when you decide you don't want to (or can't) sustain the subscription (my metaphor, not Vicky's).

    Which brings up a key issue in the LOCKSS model: Publishers must agree.  Agreement is simple to effect: slap a LOCKSS-friendly consent statement on the Web page (data ingest is by Web-crawler, so everything in this system happens pretty much automagically).  But not all publishers see through these same rose-colored glasses.  An earlier version of this post misquoted the number of titles  currently under LOCKSS management, but it is substantial and growing (thanks to Vicky for catching my error).  It will be interesting to see how it grows as a proportion of total ejournal usage .

    LOCKSS also addresses the problem of format migration by supporting HTTP content negotiation. This too-little used feature of the HTTP protocol allows clients and servers to have a conversation about the  formats in which data can be provided and accepted.  LOCKSS clients use this information to do a best-match on-the-fly re-formatting of content so an obsolete format can be rendered as close as possible to the original, and without intervention by the reader. While I am skeptical as to how this holds up in a generalized way, it may the best way to address the problem, and will probably work reasonably well for common formats that can be expected to comprise the bulk of formally-published materials.

    Vicky touched on a wide array of related issue that I cannot cover in one post, including CLOCKSS, a Controlled LOCKSS that differs very little in the underlying technology, but which is managed according to different policies that will be more appealing to a broad range of publishers. Another exciting prospect is the use of LOCKSS as a low-cost means of preserving semi-published, often poorly-curated content that is unencumbered by the constraints of publishing business models. And it fits particularly nicely into the role of libraries as managers of unique collections.  Blogs and institutional repositories are on their radar screen, and one can imagine other interesting possibilities. 

    This is terrific work that evokes a higher professional calling than simply being handmaidens to the Amazoogles.  This is technology that is carefully thought out, low in cost, high in impact, and which can be wrapped in policies that surface the professional standards and culture of collaboration that define the library community.

    -----

    Image: Son, brother, wife & mother, coerced to pose in the Short North district in Columbus, around Thanksgiving, 2006

    August 04, 2006

    Digital Repository Interoperability

    Img_2111_1 Microsoft, the Mellon Foundation, the Coalition for Networked Information, the Digital Library Federation, and the Joint Information Systems Committee in the UK, jointly sponsored a meeting in April of 2006 to promote discussion and consensus on the characteristics of digital repositories that need to be standardized in order to promote desirable levels of interoperability. The list of sponsors is a strong clue to the importance of this somewhat esoteric topic. What is at stake is a common set of functionality that will support automated interchange across a wide spectrum of repositories and assuring auditable provenance for managed materials. Not a modest objective.

    The final report for this meeting is available at the Mellon Foundation. It is not casual reading, intended to capture a discussion rather than characterize the state of the art, though readers may find the background materials and recommendations generally useful.

    Those conversant with digital library research may be familiar with the early work of Robert Kahn and Robert Willenski on repository architectures. It is interesting to note that, a dozen years later, there still is no commonly agreed terminology, let alone a universally accepted model for this critical piece of digital library infrastructure. It is a reminder of how new our digital workspace is, and how much effort remains to achieve even a rudimentary infrastructure for supporting reliable, persistent access to electronic assets.

    There is inherent tension between standardization and localized design, especially in an unstable technological environment. Builders want to build, not reconcile. The conferees at this meeting (largely implementors of early repository systems) did not even entirely agree on which aspects of functionality should be considered core for interoperability purposes (though progress was made towards this goal).

    It is perhaps not astounding that understanding of repository core functionality is not so different now than in 1995. The Kahn-Willenski list included (I paraphrase) access, deposit, and tell-me-more. The 2006 meeting agreed on obtain and harvest, and talked at length about whether put should be there. The notions map roughly across the intervening years, though current understanding of the underlying details far exceeds the 1995 model.

    If this sounds like scant progress for a decade, keep in mind that a great deal of experience has been garnered through the deployment of DSpace, arXiv, Fedora, ePrints aDORe and the like – serious repository applications that afford the practical experience necessary to bring together common expectations on how these technologies will work together.

    The problem is broad in scope… a data model and architecture to support documents, data archives, and  formats, policies, and recombinant practices that are to a significant degree yet undeveloped.  The architecture must accommodate the functional requirements of disparate domains with entirely different business models, legal requirements, and data demands.  As the first crop of serious repository applications have matured, the field is now ready for the harder task of bringing these efforts into an interoperable framework. This meeting will have helped to focus attention and effort on this important goal and set the stage for additional progress.

    -----

    Image: Downtown Seattle from Elliot Bay, July, 2006