My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    November 01, 2007

    Timely Alerts

    Yangontaxistand As a graduate student back in the 1970s, the Health Science Library at the Ohio State University offered a service they called SDI, or Selective Dissemination of Information.  Registration of the words or phrases you were looking for  earned a stack of cards on a regular basis, each identifying a journal article corresponding to one of your search terms. Nice for scanning, nice for filing.  It was pretty hot stuff at the time, a huge time saver compared with laborious thumbing of the latest Current Contents which was likely to have already been through a lot of hands before it got to you. The subscription (I almost wrote prescription!) was pretty pricey and it was the rare graduate student so profligate as to have his or her own.
    One perused these cards with an admixture of gratitude, fear and guilt... So much easier, but is everything there? Shouldn't I spend Friday nights chained to Index Medicus to be sure?  Well, all that is in the past now... or is it? Of course not.  I signed up for Google Alerts some months ago to try to catch stuff I miss on various topics, and to further assure that no part of my life is left untouched by Google.  Seems to work reasonably well for things like news, but once again, the opacity of the operational model leaves one feeling uneasy.  This unease is amplified when one gets, as I did today, retrievals for one's vanity search (yes yes, I confess) of email posted to a listserv from two years ago.
    Its not that I've been waiting for this one to show up, but why now, months after initiating the search?  The Web is perhaps bigger than we fully appreciate.
    -----
    Yangon Taxi Stand, September 2007

    August 25, 2006

    Dangerous Waters

    Rainier_marmot_1Comparisons are odious, and there's nothing scientific about this.   And being an OCLC employee makes manifest my conflict of interest, but the release of WorldCat.org and new enhancements to Google Book Search moved me to do some quick and dirty comparisons.  Disclaimer: I have not discussed this analysis with anyone at OCLC prior to blogging it, nor do I have any special knowledge about the development of WorldCat.org.  I'm writing this as a not-disinterested user.

    I tried a few searches on both, and a summary of results follows.  Again, nothing systematic about this... I'm sure we'll see more extensive analysis very quickly.  These services do, however, illustrate how organizations can sometimes be complementary, other-times competitive, and almost certainly co-evolutionary.

    Search 1: Freakonomics

    I chose this because it is the latest book I reviewed, and perhaps typifies a search-for-new-stuff-to-buy-or-borrow. 

    Google results

    • 13 results:
      •    12 english
      •     1 french

    The target is the top result. There were no duplicates, no non-English language versions (though there was a result in French). The non-target results appear to be resources that mention Freakonomics, and which thus might be very useful in a browsing sense: good fan-out to related resources. Cover art enhances the result set -- a definite plus. Links to other materials related to a given resource were very good: preview capabilities, table of contents, index, and about this book (which provides links to reviews, for example).  These are great features that are currently not available via WorldCat, and which need to be in some manner if libraries are to offer a competitive Web presence in the long haul. 

    WorldCat results

    • 11 results
      • English (4)
      • Chinese (2)
      • Danish (1)
      • French (1)
      • Korean (1)
      • Portuguese (1)

    Ten of the 11 are non-English language versions, or duplicates (three English language records that appear to be the same item). The 11th is a complete mystery to me.  It seems to be in the field of economics, but has no obvious key in common with the search target.

    No cover art, or rather, cover art becomes evident only at the item level, not in the search result set. Very useful sidebar next to the result-set allowing you to refine (or redefine) your search by author, content, format, language, year of publication.  This ability to involute the database is very helpful and easy, and offsets to some degree not having the ancillary material that Google has (but not enough, in my opinion).

    Search 2: Plato's Republic

    Google 109,000 pages

    Once again, cover art and other materials (much less advantageous in the case of a standard classic, I should think).  The large catch reflects the search approach, I assume.  The first page results includes several versions of the target, and other branch-points that might be useful to a reader.  Do they mean 109,000 pages, or items?

    Worldcat  590 results

    The first page of the WorldCat search  does not appear to have the search target, but rather, critical reviews, essays, and such.   The sidebar becomes immediately useful... click on PLATO under authors, and you get various versions of the real deal, all authored by Plato.  In this case, this is particularly important, because translations are very much unique works that will earn their own reputations and followings.   The side bar says there are 22.  CLick on it, and you get 31.  Still, a pretty concise list of Platonics.

    You can do an advanced search in Google Books, as well, filling in PLATO in the author field, which narrows the field to 22,700 pages.  Really?  Not very helpful.

    OK, but the one I wanted is not in either set (well, perhaps buried in the 22,700, but that's not helpful).  I'm looking for Allan David Bloom's translation, as a friend told me this is a really good one, and I know of Bloom's other work as well.  Turns out that it's not on my list of 31 in WorldCat.  I've been given the ISBN though, [0-465-06934-7], so I try that in the search box, and voilà.  I have the single record I want.  Why didn't I get it the first time?  Because the title is The Republic of Plato, and my first search was on Plato's Republic.   Where is FRBR when we need it?

    How about in Google Books? The same ISBN 0-465-06934-7 gives me:

    A Willful Volunteer: Examining Conscience in an Unconscious World. 

    Huh?  The quirkiness of ISBNs is well-known to librarians.  I've encountered the problem a number of times... an alarming number of times, though not in a systematic way.   I'm surprised that the problem surfaces in one service but not the other, though.  Do the same ISBN search in plain Google, and you get desired book as the first of 17 results (the others are things such as course syllabi that cite the ISBN).   Nowhere in this set is the above tome, willful, conscientious, or not.  Sounds like a data integrity problem to me.

    Worldcat seems to outperform Google Books here, and this is what one might expect, given that the search is much more in the conventional catalog-search idiom.

    Search 3: The Jefferson Bible (by ISBN)

    A final example that I've written about before and so wanted to try here.   As I've related before, what is important about this little book to me is the introduction and afterward in the particular version I have in hand.  So, the instance counts (for me) more than the work.

    The verso of this volume lists two ISBNs:
    0-8070-7702-x (cloth)
    0-8070-7702-1 (paper)

    The first ISBN takes me to a pair of duplicate records in WorldCat.  There is no result in Google Books.

    The second has no records in either service... not terribly surprising, given that its a paperback version of a book that is rather obscure to start with, and which is thus less likely to be cataloged by a library.

    Some observations based on these brief explorations:

    • Google Book's is a strong offering in the library catalog space, offering great features that WorldCat does not now offer.  We need the Open Content Alliance, or something like it, in order to be a stronger player in this space, and we need to take full advantage of it.  Google's digitization efforts are an important contribution to the information world, but the common good is better served by an open architecture that allows others to both create and capitalize on this value.
    • Google is well positioned to create additional value through linkages to related information (again, OCA is the obvious pathway by which such value could be open to others).
    • The WorldCat approach to refining a search seems superior to me... i don't have to fill in another set of fields (let alone click to a new page), and there are obvious selections made for me that allow me to initiate a new search in the sidebar.  Involuting the database in this way has lots of great potential.  Will people use it?  Hard to tell.  I'm guessing that it will be used modestly, but will be a strong asset for those who do.  More of this, please.
    • As libraries start to work out the last mile logistics, getting a book from the library via WorldCat.org will be as convenient as buying it online... and cheaper, of course (at  least for the patron).
    • Users will benefit from FRBRization of such services.   Should The Republic of Plato have appeared within a search for Plato's Republic?  Yeah, for sure.  How realistic is it to expect that it can be achieved using economical approaches to FRBRization?  Good question.   
    • Neither of these services embodies much of the so-called social software paradigm, though Google Books certainly has some of the elements in place (searching for reviews, for example).   Social bibliography - reviews, tagging, recommender systems, and the like,  will be a critical element of public bibliography, and the library community has some catchup to do here.  I couldn't help but notice that the Google Books About this Book link seemed to have lots of Library Thing review links.
    • And Identifiers.  OCLC's permalinks (OCLC numbers embedded in a WorldCat link) go a long way towards the goal of a standardized, persistent, and succinct identifier for library content.  As a blogger, I hope that the permalink might be displayed prominently at the top of each record, (a click away is further than it needs to be).  It is far superior to a transactional link, and in fact, the lack of a permalink in Google Books seems an oversight (understandable, if your world-view is dominated by search hegemony). 
    • The ability to construct a useful identifier with an ISBN (or any other standard identifier, for that matter) is a Good Thing(tm).  My brief examples here show that it is not without complications, though.  I don't know the magnitude of the problem.  It isn't obvious that the WorldCat search box is ISBN friendly, though it clearly is.  Not sure if its important to make this explicit.

    Who would have imagined a decade ago that bibliography would be newsworthy, and of broad public interest?  It is, and its really great to have WorldCat at one's fingertips.  The other guys are doing great stuff too.   Bravo!
    -----
    Image: A Marmot (I think) sunning himself or herself on Mt. Rainier, July 2006

    August 08, 2006

    Social Networks and Information Retrieval

    Sigiropeningplenary Jon Kleinberg of Cornell University delivered today's Keynote at SIG IR 2006 here in Seattle.  Jon is a leading investigator in the area of complex networks, and in particular, in exposing mathematical relationships between large networks and small ones.  His talk was dense and provocative, ranging across well-known history and recent experiementation in the domain of search theory in social networks.

    The notion of "six degrees of separation" is well known these days, an illustration of how even large social networks (the world) can be traversed in a surprisingly low number of links.  Jon described the clever, pre-computing demonstration of this phenomenon by Stanley Milgram in the 1960s.  Theories and experiments concerning network models are not so new, then, as most of us might have guessed.  What is new, however, is the proliferation of large-model systems that leave records of social activity that is unfathomably rich in data for analysis.  As the Web has proliferated, the availability of full-text data and rich linking among resources has superheated the domain of link analysis leading to extraordinary entrepreneurial oppotunity now familiar to all of us, but also to a deeper understanding of linking theory and self organizing systems (see, for example, Albert-Laszló Barabási's Linked: How everything is connected to everything else, and what it means for business, science, and everyday life.

    Kleinberg's interest, however, is less on the linkage of information assets than on the linking within social networks that comprise the heart of what we have come to refer to as Web 2.0.  Social networks on the Internet are not all that new, either... indeed, Usenet newsgroups, bulletin boards, chatrooms and email networks predate the Web, and are arguably the most important side effect of Darpa's experiments in robust networking now known as the Internet.

    One difference, though, in the Friendster/Myspace/Facebook world that mystifies oldsters (especially legislators) and captivates youngsters is that for the first time, one's place in a social network is visible to all, and subject to self-aware 'gaming' by the participants, as well as exploitation by information retrieval scientists, marketeers, predators, spooks, and even parents with the courage to delve.

    -----

    Image: Opening Plenary of SIG-IR 2006 in Seattle, August 07, 2006

    August 07, 2006

    Theory Matters

    Sigir06_slw_0004 SIG IR 2006 is going on in Seattle at this moment.  CJ van Rijsbergen just received the Salton Award, given every three years to the brightests stars in the Information Retrieval firmament.  In case any of us needed to be reminded that that firmament has rather broader horizons than it once did, this year's SIG IR meeting is the largest ever, and t-shirts and logos of the Amazoogles are everywhere in evidence.   A conference of 700 + hardly even qualifies as a special interest group any longer.
    Rijsbergen spoke in his acceptance speech of the phases and character of his career, and people and activities that have shaped it.  More than once he mentioned how much he had been inspired by his students, and in particular, his early graduate students who helped him shape his mentoring skills at a time when they were still rudimentary.
    He also raised the question of whether theory is necessary, and indeed, whether there can be a theory in a "science of the artificial" such as Information Retrieval.  His answer will be of no surprise to anyone familiar with his work.  Rijsbergen is a top-down theoretician, one who is responsible for a major part of the theoretical underpinnings of IR.  The continued relevance of his classic text on the subject, decades after their first publication, is certainly evidence of his impact in this respect.
    Rijsbergen's talk evinced a belief that there is much left to elucidate in the theory of clustering, and, if I understand him correctly, much left to be learned by practitioners from the theory that already exists.   I certainly had a strong sense of the greybeard speaking to... perhaps gently admonishing... the 20-somethings in logoed t-shirts:  "Hey... we've done some of this before... might be useful for you to understand it better."  All in all, an inspiring talk about the importance of understanding beyond simply what works.
    -----
    Image: The opening dinner for the conference was hosted at the Boeing Future of Flight showcase facility, near to the Everett manufacuring plant at Paine Field, whence this picture taken from the Highway.