My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    January 08, 2007

    Cows and the Colossus

    Img_6511 Mike Keller delivered a presentation at OCLC today entitled Mass Digitization in Google Book Search: Effects on Scholarship. Mike is director of Stanford University Libraries, and wears an academic publisher’s hat as well, being responsible for High Wire Press and Stanford University Press. He commands a panoramic view of the digital scholarly landscape, and has the intellect and experience to convert view to vision. This vision is both breathtaking and, in some respects, disturbing.

    For those unsettled by the rapidity of Googlian hegemony in library spaces, Mike constructs a vivid and compelling argument for embracing the revolution, reaching back to the digitization of card catalogs. Several salient observations from his remarks:

    • Digitization of the card catalog resulted in a 50 % increase in book usage
    • Google indexing is the #1 driver of article usage in High Wire – by a large margin (10 to 1 beyond the next highest, if I understood him correctly)
    • Metadata searching (what Keller describes as subtle searching), in combination with novel methods of taxonomic search and citation cross-linking, dramatically improves discovery and navigation within large result sets.

    It is difficult to resist Keller’s assertion that Google Book Search (GBS) is likely to revolutionize access to books more than any single factor in the library world – if not directly, then indirectly. It would be hard to be a librarian and not find chagrin in this realization.  Keller rightly urges us to focus on the larger picture and the many benefits.

    Stepping away from the somewhat daunting implications for libraries, Keller suggested that the most important thing about GBS is that it has occasioned a great debate about the importance of copyright in the intellectual life of the nation (and the world). The legal wrangling surrounding the digitization of books in the so called Google Libraries may indeed midwife a re-examination of copyright issues on a grand scale. Certainly the current interpretation of copyright law in the United States is far more likely to be driven by supply-chain economics than by the intent of the framers of the Constitution. Perhaps for the first time, there are heavy hitters on both sides of the argument, which may result in a reinterpretation of fair use that makes more sense (to libraries and readers) in the digital age.  One may hope.

    Keller pointed out the importance of healthy competition among various digitization projects: the Million Book Project, GBS, the Open Content Alliance, the Microsoft/British Library and Microsoft/Cornell efforts. Could we have imagined anything like this rush to mine the library shelves of the world even a few years ago? Could we (the library community) have marshalled either the vision or the resources to accomplish the task on our own?  It is unlikely.

    On the dark side, he raised the image of libraries as herds of cows in these deals. Participants are kept in the dark, enjoined from sharing the details of their deals with other participants, let alone with their public constituencies. They (we) don’t know if they are being “milked, butchered, or destined to cover automobile seats of expensive cars.”  [I paraphrase, but only slightly]

    What is evident is that benefits for the G-Libraries are substantial. The libraries involved receive a windfall of the digitized contents of their collections (though, Keller also points out that much is likely to have to be recaptured at higher resolution in the future).  For institutions of the caliber of these early G-partners, their technical wherewithal and innovative approaches promise rich research environments as well as new services to readers.

    There are plenty of objections, of course, some with a Luddite feel, others based on money, control, cultural imperialism, disintermediation of libraries, and the hegemony of a colossus too large and too innovative to ignore. Publishers, authors, and librarians all have much to lose, but also many potential gains in how this plays out in the long run. It would seem that only for readers is the benefit clear and unambiguous. But even readers may lose in a world where any monolith controls access to the content, whether that monolith be a government or Google.

    Mike Keller is an articulate spokesperson for the dilemmas and opportunities that are upon us. I found myself unable to resist most of his arguments, even a few I wanted to reject. Our community (or at least an elite component of it) has been offered a deal impossible to resist, a side effect of the inexorable march of digitalization that began in our realm with Henriette Avram’s vision of machine readable cataloging.

    Most of our constituents will benefit greatly, while libraries will rattle from the footfalls of the Googles and their ilk. Still, I am unnerved that these developments are cloaked in non-disclosure agreements, even as I am excited about the extraordinary challenges and possibilities we face.  Will cooperation within the community survive the Faustian bargains from without?

    -----

    My thoughts here are almost certainly flawed renditions of Mike's remarks... I was jotting notes as fast as my dyslexic fingers would go, and trying to pay attention as well.  Responsibility for distortions or errors are of course my own.
    -----
    Image: OCLC at dawn

    August 25, 2006

    Dangerous Waters

    Rainier_marmot_1Comparisons are odious, and there's nothing scientific about this.   And being an OCLC employee makes manifest my conflict of interest, but the release of WorldCat.org and new enhancements to Google Book Search moved me to do some quick and dirty comparisons.  Disclaimer: I have not discussed this analysis with anyone at OCLC prior to blogging it, nor do I have any special knowledge about the development of WorldCat.org.  I'm writing this as a not-disinterested user.

    I tried a few searches on both, and a summary of results follows.  Again, nothing systematic about this... I'm sure we'll see more extensive analysis very quickly.  These services do, however, illustrate how organizations can sometimes be complementary, other-times competitive, and almost certainly co-evolutionary.

    Search 1: Freakonomics

    I chose this because it is the latest book I reviewed, and perhaps typifies a search-for-new-stuff-to-buy-or-borrow. 

    Google results

    • 13 results:
      •    12 english
      •     1 french

    The target is the top result. There were no duplicates, no non-English language versions (though there was a result in French). The non-target results appear to be resources that mention Freakonomics, and which thus might be very useful in a browsing sense: good fan-out to related resources. Cover art enhances the result set -- a definite plus. Links to other materials related to a given resource were very good: preview capabilities, table of contents, index, and about this book (which provides links to reviews, for example).  These are great features that are currently not available via WorldCat, and which need to be in some manner if libraries are to offer a competitive Web presence in the long haul. 

    WorldCat results

    • 11 results
      • English (4)
      • Chinese (2)
      • Danish (1)
      • French (1)
      • Korean (1)
      • Portuguese (1)

    Ten of the 11 are non-English language versions, or duplicates (three English language records that appear to be the same item). The 11th is a complete mystery to me.  It seems to be in the field of economics, but has no obvious key in common with the search target.

    No cover art, or rather, cover art becomes evident only at the item level, not in the search result set. Very useful sidebar next to the result-set allowing you to refine (or redefine) your search by author, content, format, language, year of publication.  This ability to involute the database is very helpful and easy, and offsets to some degree not having the ancillary material that Google has (but not enough, in my opinion).

    Search 2: Plato's Republic

    Google 109,000 pages

    Once again, cover art and other materials (much less advantageous in the case of a standard classic, I should think).  The large catch reflects the search approach, I assume.  The first page results includes several versions of the target, and other branch-points that might be useful to a reader.  Do they mean 109,000 pages, or items?

    Worldcat  590 results

    The first page of the WorldCat search  does not appear to have the search target, but rather, critical reviews, essays, and such.   The sidebar becomes immediately useful... click on PLATO under authors, and you get various versions of the real deal, all authored by Plato.  In this case, this is particularly important, because translations are very much unique works that will earn their own reputations and followings.   The side bar says there are 22.  CLick on it, and you get 31.  Still, a pretty concise list of Platonics.

    You can do an advanced search in Google Books, as well, filling in PLATO in the author field, which narrows the field to 22,700 pages.  Really?  Not very helpful.

    OK, but the one I wanted is not in either set (well, perhaps buried in the 22,700, but that's not helpful).  I'm looking for Allan David Bloom's translation, as a friend told me this is a really good one, and I know of Bloom's other work as well.  Turns out that it's not on my list of 31 in WorldCat.  I've been given the ISBN though, [0-465-06934-7], so I try that in the search box, and voilà.  I have the single record I want.  Why didn't I get it the first time?  Because the title is The Republic of Plato, and my first search was on Plato's Republic.   Where is FRBR when we need it?

    How about in Google Books? The same ISBN 0-465-06934-7 gives me:

    A Willful Volunteer: Examining Conscience in an Unconscious World. 

    Huh?  The quirkiness of ISBNs is well-known to librarians.  I've encountered the problem a number of times... an alarming number of times, though not in a systematic way.   I'm surprised that the problem surfaces in one service but not the other, though.  Do the same ISBN search in plain Google, and you get desired book as the first of 17 results (the others are things such as course syllabi that cite the ISBN).   Nowhere in this set is the above tome, willful, conscientious, or not.  Sounds like a data integrity problem to me.

    Worldcat seems to outperform Google Books here, and this is what one might expect, given that the search is much more in the conventional catalog-search idiom.

    Search 3: The Jefferson Bible (by ISBN)

    A final example that I've written about before and so wanted to try here.   As I've related before, what is important about this little book to me is the introduction and afterward in the particular version I have in hand.  So, the instance counts (for me) more than the work.

    The verso of this volume lists two ISBNs:
    0-8070-7702-x (cloth)
    0-8070-7702-1 (paper)

    The first ISBN takes me to a pair of duplicate records in WorldCat.  There is no result in Google Books.

    The second has no records in either service... not terribly surprising, given that its a paperback version of a book that is rather obscure to start with, and which is thus less likely to be cataloged by a library.

    Some observations based on these brief explorations:

    • Google Book's is a strong offering in the library catalog space, offering great features that WorldCat does not now offer.  We need the Open Content Alliance, or something like it, in order to be a stronger player in this space, and we need to take full advantage of it.  Google's digitization efforts are an important contribution to the information world, but the common good is better served by an open architecture that allows others to both create and capitalize on this value.
    • Google is well positioned to create additional value through linkages to related information (again, OCA is the obvious pathway by which such value could be open to others).
    • The WorldCat approach to refining a search seems superior to me... i don't have to fill in another set of fields (let alone click to a new page), and there are obvious selections made for me that allow me to initiate a new search in the sidebar.  Involuting the database in this way has lots of great potential.  Will people use it?  Hard to tell.  I'm guessing that it will be used modestly, but will be a strong asset for those who do.  More of this, please.
    • As libraries start to work out the last mile logistics, getting a book from the library via WorldCat.org will be as convenient as buying it online... and cheaper, of course (at  least for the patron).
    • Users will benefit from FRBRization of such services.   Should The Republic of Plato have appeared within a search for Plato's Republic?  Yeah, for sure.  How realistic is it to expect that it can be achieved using economical approaches to FRBRization?  Good question.   
    • Neither of these services embodies much of the so-called social software paradigm, though Google Books certainly has some of the elements in place (searching for reviews, for example).   Social bibliography - reviews, tagging, recommender systems, and the like,  will be a critical element of public bibliography, and the library community has some catchup to do here.  I couldn't help but notice that the Google Books About this Book link seemed to have lots of Library Thing review links.
    • And Identifiers.  OCLC's permalinks (OCLC numbers embedded in a WorldCat link) go a long way towards the goal of a standardized, persistent, and succinct identifier for library content.  As a blogger, I hope that the permalink might be displayed prominently at the top of each record, (a click away is further than it needs to be).  It is far superior to a transactional link, and in fact, the lack of a permalink in Google Books seems an oversight (understandable, if your world-view is dominated by search hegemony). 
    • The ability to construct a useful identifier with an ISBN (or any other standard identifier, for that matter) is a Good Thing(tm).  My brief examples here show that it is not without complications, though.  I don't know the magnitude of the problem.  It isn't obvious that the WorldCat search box is ISBN friendly, though it clearly is.  Not sure if its important to make this explicit.

    Who would have imagined a decade ago that bibliography would be newsworthy, and of broad public interest?  It is, and its really great to have WorldCat at one's fingertips.  The other guys are doing great stuff too.   Bravo!
    -----
    Image: A Marmot (I think) sunning himself or herself on Mt. Rainier, July 2006