My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    December 11, 2007

    Roll over, George

    Boole_0047 Jonathan Rochkind made some thoughtfully peevish comments on my previous post on RDA and the Futures report which drove me (perish the thought) back to the document itself.

    ...I'm confused by your apparent sympathy with the Working Group
    recommendations to suspend RDA work...

    ...That recommendation seems to instead be based on the fantasy that we need to spend lots of time 'testing' FRBR, at the end maybe deciding that FRBR is no good at all

    I don't see this in the recommendations at all.  What i read (in recommendation 4.2.1, p 29) is a clear mandate to resolve the existing ambiguities in the FRBR model in order to:

    provide a more robust framework for the creation of  the resource description and access rules that will be used in the future to support a broad range of searching options (also on page 29). 

    This is essential, and should be undertaken in the light of functional pragmatism, not ideology.  And certainly I agree with Jonathan that there is little time to waste.  The Futures report does not impugn the value of FRBR, but simply recognizes that we as a community do not agree about the importance of Expressions.  If it is critical in other ways, I missed it.

    There is much stronger concern expressed in the report about the uncertainties of RDA, having to do with unsubstantiated benefits, alignment with existing standards, and the business case for it (see the bottom of page 24).

    The subsequent recommendation (on the next page: 3.2.1) is stated more strongly than I might have chosen.  But the heading (Suspend Work on RDA) is elaborated with untils, and makes clear that useful work has been initiated with JSC and DCMI, and should continue.

    But any assertion that debates going on on the RDA list represent progress towards these goals is, in my view, whistling past the graveyard.

    And as for Jonathan's generous remark:

    In fact, I feel like you've expressed well the argument that I'd want
    to submit as comments to the Working Group

    I know that at least one of them reads my blog ;-)

    -----
    yes... THAT George Boole... taken in Cork, at the end of the DCC meeting on persistent identifiers in 2004

    In fact, specifically, THIS George Boole: http://worldcat.org/identities/lccn-n83-144364 (thanks, Thom)

    October 05, 2006

    It's the Model, Stupid

    Img_5011_1 A short history of data modeling in the Dublin Core Metadata Initiative, and what it means for the future of cataloging

    The cardinal rule at the first Dublin Core meeting way-back-when was Thou Shalt Not Conflate Syntax and Semantics. A rule honored more in the breech than in observance perhaps, but it emphasizes the importance of separating these two fundamental facets of communicating structured information. We were right about this… almost. The missing part of the picture was a sound underlying data model. 

    We knew what we were trying to say, we thought we knew how to say it. How hard is it to describe the basic characteristics of a resource, after all? Title of resource is…. Creator of resource is…. We even spoke about the initiative in terms of grammars and evolving pidgin languages (simple, emergent grammars). We hobbled along with only a vague common understanding – a model implicit in the aggregate projects using Dublin Core, propagated through imitation, nowhere formally specified.

    It isn't though we didn't try.  Early attempts to arrive at a formal data model were fraught with contention and even acrimony. Difficult meetings (Washington… Dublin… Crete), leading to small beachheads that, lacking broad consensus, soon washed away. Maybe it wasn’t so important? After all, people still used DC, we continued to attract adherents. DC was spreading… 25 languages, 50 countries. And AACR2-MARC, arguably the world’s most successful resource description standard, didn’t have a data model either… how bad a problem can it be?

    But the chickens come home to roost. The lack of a formal model led to a plethora of non-interoperable systems, crippling one of the foundation principles of the Initiative. It took 10 years for DCMI to finally evolve and adopt a formal model, and one might wonder whether simple exhaustion was a factor in its ultimate acceptance. It will be a long time before this model channels practice sufficiently to bring the many flavors of DC closer to the goal of sharing metadata across systems, let alone across various other metadata frameworks.

    The Dublin Core Abstract Model, led by Andy Powell and Pete Johnston, and later attracting the efforts of Mikael Nilsson in the cause of bridging the DCMI framework with that of IEEE LOM, is a hybrid distillation of ideas gleaned from library practice and the Semantic Web’s cornerstone technology, RDF. It reflects insights of emerging Web practice (the use of URI’s as persistent identifiers, for example), and embraces lessons learned from a decade of early metadata adopters.

    Yesterday afternoon at DC-2006, Diane Hillmann presented a summary of progress (and… surprise… contention) associated with the RDA effort, what most people understand as the international revision of AACR2. From my own uninformed perspective, it appears that this effort suffers from much the same problem that we have had in the Dublin Core – a data model implicit in years of practice and rule-revision on top of rule-revision, resulting in a focus on the minutia of rules rather than being guided by formal principles of description.

    The Web has forced us all out of isolated communities of practice and into the Internet Commons. Certainly the practice and topography of librarianship is changing out from under us. As we struggle under the stress of these changes, it is perhaps predictable that legacy systems such as cataloging practice will change even more slowly. The RDA effort recognizes the importance of updating our profession to fit more comfortably into the Internet Commons. If we are to achieve anything like the interoperability we hope for, we will need common structural models. If the effort devolves to simply unraveling existing rules and rewinding the yarns, we will fall short of the integration we need to support our future. The successes and failures of the DC community in its own modeling struggles can be useful… and reusable.  I gather that the Joint Steering Committee has sought consultation with representatives of the IEEE LOM metadata community as well as with DCMI. It would be fitting if the DCMI could return some value to the community that has provided so much of the insight that has motivated its own progress.

    Mikael Nilsson’s exhortation is on the mark. Less talk about metadata sets, and more talk about models. It is both difficult and important to get this piece right.

    ------

    Image: DC-2006 welcome reception

    September 07, 2006

    Neutrality is over-rated

    Img_3875 Traditional notions of surrogacy in the library world revolve around catalog records – a neutral distillation of attributes intended to support discovery and management. In the age of the Amazoogles, richness of linking and community-generated surrogates play a welcome role in discovery and evaluation. Several interesting issues emerge from this shift.

    Surrogates as first class objects

    I’ve alluded to the importance of this in a previous post.  It is part of the perspective shift that is, I believe, fundamental to the transition from Library 1.0 to Library 2.0 thinking.

    Neutral surrogates as opposed to evaluative surrogates

    Librarians have traditionally positioned themselves in a neutral role… above the ideological fray of content. I think this is as much artifact as intent. A bib record should be a neutral inventory of attributes, and to the extent that the catalog was central to our service, that neutrality served well. Of course, Libraries have long offered reader advisories. Nancy Pearl, Seattle's (the country’s?) best-known librarian (anyone seen a James Billington shushing Action Figure lately?) has acquired a national reputation as a voice of reading.

    We have entered the era of recommender services to assist in our every consumer selection. The best example of this in the Library  space is Amazon – the reviews are widely read and eagerly written, and the marketing data (people who bought this, also bought that…) is valuable indeed.  Library Thing has, through the application of now well-understood social collaboration techniques, has introduced a similar functionality that is independent of book purchase.

    Which is more valuable as a finding aid? A catalog record or a review? One focuses on discovery, and the other on suitability. But within the everything-indexed context of the Web, both are important, and the distinction blurs.  There is a case to be made for library-mediated evaluative surrogates coexisting cheek-by-jowl with traditional cataloging records.

    What about an Amazon-library system mashup?  Just what Amit Gupta has offered (brought to my attention in Lorcan Dempsey's Blog).  All very interesting co-evolution, but what I really wanted to do was show off the Nancy Pearl Action Figure (deluxe Version).  Small Parts. Not suitable for children under 3 or those without a library card.  Who needs Jane Austen, anyway?

    July 14, 2006

    Book Review: The DAM Book: Digital Asset Management for Photographers

    Kubotaducklings I've been an avid digital photographer for just a few years now, and as my collection of images has grown, I've outstripped, in number and quality, my film 'production'.  The marginal cost of taking another image is zero... its only bits, right?  Well, thats nonsense, of course.  It is true that you can capture lots more images without incurring additional costs immediately, but managing them is far more difficult than the shoebox methodology that serves pretty well with prints.  Time is the real cost (well, that, and all the machinery you need to buy and rebuy over the life of the images).

    It is easy enough to put off the hard part... organizing and cataloging.  In fact, most people won't ever do it, and the likely result is predictable.  Bit rot writ large.  It can be argued that the world is no worse off for this, and perhaps the opposite, but if you're worried about the persistence of the digital images of your life and family... well, paranoia is just a heightened sense of reality.  Disks are flakey and WILL fail.  Home-burned CD's are unreliable, and DVD's are probably worse.  New media formats are always on the horizon.  Shoeboxing prints is still probably the best way to get images into your children's hands.  But print technology is in flux, as well, and you need to understand the technology underlying the prints. Certainly your home photo printer is a dicey bet, unless it happens to be a high-end Epson or equivalent, with archival pigment inks.  Its enough to drive you back to slide film.  Well, not really.

    Instead, buy The DAM Book: Digital Asset Management for Photographers.  Peter Krogh's addition to the O'Reilly library is a must-read if you're serious about keeping your digital images.   A professional photographer who claims to have captured 135,000 images in a three year period (about 1 for every 12 minutes in every hour of every day... hmmm), Krogh has laid out a readable, convincing text on strategies and choices for managing images.  The book talks about file management, naming strategies, software environments, and hardware platforms suitable for assuring the longevity of your images.   If configuring Photoshop's Bridge application seems daunting, Krogh walks you through each step and explains why. If you knew you should be creating metadata, but were intimidated by the task, he will give you methods and confidence.  And scare you into it perhaps.
    The book is up-to-date on the latest metadata standards (IPTC, which was only recently approved, is the heart of image description in the world of journalism and commercial imaging).  It is, I am pleased to say, a partial derivative of the Dublin Core.  Need less?  Probably, though following his guidelines and recommendations will impart the confidence you need to make good decisions.
    I've been a metadata maven for more than a decade.  Peter Krogh is about to make me (finally) a cataloger as well.
    -----
    Image: Ducklings, photographed in Kubota Gardens, one of two wonderful public Japanese gardens in Seattle.

    May 15, 2006

    Tag Team Wrestling

    Reed Tim "LibraryThing" Spalding has just announced a new feature on LibraryThing, currently the most impressive instance of so-called Library 2.0 that I know of.  The new feature addresses a topic that many (including myself) have been wrestling with for a while now: the relationship of formal knowledge organization systems to folksonomies.  Tim's Mother's Day blog post speaks to the raging controversy:

    Are tags better than subjects? Are subjects better than tags? Are tags just a fad? Will tags replace subjects? Are tags evil? Are subjects evil? (Believe me, the idea is out there.) Librarians have become deeply emeshed in the debate, with partisans on both sides. Until now, there hasn't been much in the way of hard data, at least for books. LibraryThing provides that.

    This last is the exciting part.  LT has provided a platform to explore the behaviors of both in a coherent, data-rich system.  Researchers, start your systems!

    -----
    image: Carolyn Dunford, of the UW MLIS program,alerted me to a lovely wildlife preserve nearly within slingshot distance of the UW (known as the Montlake FIll).  This is a closeup of one or another species of Horsetail (Thanks to Bob O'Hara for the ID), taken at the preserve. 

    April 13, 2006

    Cheep Links

    Gullface_1 Geoff Froh, one of the MSIM graduate students who has befriended me during my stay here at the UW iSchool, sent me the following link from last September that has a lot of great ideas about creating a richer web of information -- cheaply:

    Using Wikipedia and the Yahoo API to give structure to flat lists

    Hackdiary is Matt Biddulph’s idea-rich blog of his travels through Web 2.0.

     

    The post that motivated this entry is a succinct description of some of Matt’s work at the BBC, but it transfers pretty much wholesale to what we should be doing more of with library data:

    "adding value to your own data by using external information"

    Biddulph’s post emphasizes some of the benefits of the open Web that are available to anyone with a creative vision of how to capitalize on them to create more value at low cost.

    One of my early blog entries was about the addition of name authority data into the German Wikipedia, and I seem to be bouncing back again and again to the concept of public bibliography – creating rich semantic linkages around traditionally-described formal resources that gives them context and heightens their visibility in Web space.   Using the open Web to enrich our assets while in turn contributing to the information assets of the Commons embodies the reciprocal creation of value that fits neatly within our tradition and is, I believe, critical to our future.

    -----

    image: Gulls ride for free on the Washington State Ferries.   And they aren't bashful.  March, 2006, near Orcas Island in the San Juans

    February 08, 2006

    The New Cooperative Cataloging

    Cannonpen_1 Readers of an earlier post on my blog will have seen that I dived into the Open WorldCat Not-Quite-A-Wiki Book Review Cache for the first time as a reviewer this week. I also posted a note on why it is critical for libraries to participate more fully in the Linking Economy [First Class Objects and the Currency of Linking]. Not just important, but critical.

    So, I did one myself… gotta eat your own dog food, eh? OK, well, you have to establish an account (user name, password, that stuff).  Another password to manage… argh. But still… it’s important, so I did it.

    Then cometh the dreaded TERMS & CONDITIONS, in which I learn, among other things, that by contributing content, I agree to defend OCLC in any legal disputes arising from my content.  My immediate  vision is of Lil’ Ole Me… armed with naught but fountain pen against the onslaught of Smaug and his law clerks, defending the Mothership from harm (sorry about the mixed allusions... too many allegories lately).

    Well, that made me swallow hard. Visions of legal fees withheld from my salary for the rest of my unnatural life. How in heaven’s name are we going to get people to contribute time and effort in the service of the Bibliographic Cooperative if we indenture them legally in the process? Well, my immediate response was to whine about it (those who know me will not find this startling).  One reply to my whine (by a colleague with legal training who does not work for OCLC) went something like: ‘well, of course OCLC has to do this. Look at the international cartoon crisis!’

    Hmmmm... that brought me up short.

    OK… the risks are not zero. But risk aversion threatens to obscure the values we stand for. Some of those risks, of course, are real; others have the character of being probabilistically small but of horrendous consequences. Mostly it’s impossible to tell ahead of time.  Therein lies the rub.

    The biggest risk of all, for our profession, is not to be at the table, not to stand up for what we believe, not to bring our professional values to bear in the most effective manner possible. In the present case, that means enriching the public link structure of bibliography.

    My exhortations:
    To OCLC:

    • Make WorldCat Wiki reviews link-addressable and harvestable
    • Perhaps use Creative Commons licenses to establish the content as solidly part of the intellectual commons that is our foundation (we’ve done this in the past with WebJunction, as I learned yesterday in my visit to the WebJunction offices in Seattle).

    To Librarians:

    • Let nothing you read go un-reviewed.  Don’t feel you are competent to review your physics library’s latest treatise on string theory? Take responsibility for recruiting your patrons to do it.
    • We have the scarcest asset on the Web – Public Trust. Advise your users. Make the creation of linkable advisory content a personal priority. 100,000 librarians x 1 review per week x 50 weeks = a pretty good start.
    • Take the risk.

    To Libraries:

    • Recognize and promote this effort. Provide incentives to staff. This is the New Cooperative Cataloging for the Web.

    A note about the image in this post.  It is from a larger work by Falah Shwan, an Iraqi refugee living in Columbus.  I'm not really sure about my reproduction rights -- Marguerite and I purchased the work from which it is excerpted, and it has occurred to me that it is a perfect logo for the Radical Militant Libraian T-shirts that we all should have ;-).  If there is interest, we will try to track down the artist and see about licensing it.

    January 20, 2006

    Plastic Cockroaches... Who ya gonna call?

    Tulips There is very little in the library world that has not been touched by the Web revolution, and sometimes it is hard to tell what is good for us and what simply threatens to make us quaint. It is clear, however, that we must examine what we are, what we do, and the value we add to the Information Commons. We have a business model, however unnatural it is for us as a profession to acknowledge that. For the bulk of us, that business model is to make information look free. We know it isn’t, but we want our users to feel otherwise.

    On the back side of that business model, we need to understand the cost structures of providing the services that we provide.

    I had occasion to meet Peter McCracken of SerialsSolutions yesterday, at the Digital Futures Alliance meeting about which I posted earlier today. Later Peter sent me a note about a tool that they have just launched that helps explicate some of the cost structures associated with cataloging electronic resources. Now, I’m the last person you want doing your budget projections… OCLC Research is a cost center not a contribution-to-equity center (and I do my part to make it so!). I make no endorsement of this tool, its assumptions, or how it might fit into your library budget planning, but it strikes me as an example of the way we should be thinking about the cost structures that support our services.

     From Peter’s note:

    The "MARC Cost Calculator" assumes that an individual is interested in cataloging all of their electronic resources in the OPAC. It allows a person to input data on how many electronic holdings they have, what percentage will be managed by students, by paraprofessionals, and by catalogers, and what the salary or wage is for each type of worker. The user also estimates the time it takes to input each type of record. The calculator then reports how much time and how many dollars it will take to get all that work done. I think some folks might be surprised to see how much time and money they'd invest if they tried to do it themselves. It's at
    http://www.serialssolutions.com/marccalculator.asp.

    I had to chortle a bit writing this post, given what I said a couple days ago about not keeping my home page up to date.  If you google Peter, the first entry is a home page he made and, it would appear, long since abandoned.  At the bottom:

    This document was last modified 1 May 1997 - the first time in over two years, so don't expect all of the links to work.

    Amazingly, some of the links DO still work, though, sadly, not the one to

    Archie McPhee, Seattle's "Outfitters of Popular Culture," and THE place to find plastic cockroaches, rubber chickens, glow-in-the dark Madonnas, and everything else the modern family needs.

    Alone in Seattle, and no plastic cockroaches.

     

    January 12, 2006

    Diamonds in a dunghill

    Ranier Jefferson’s Library Catalog and The Jefferson Bible

    Exploring the house I share for the duration of my sabbatical, I came upon a thin tome in the living room: The Jefferson Bible [ISBN 0-8070-7702-X]. It caught my eye because Seth Becker, a friend of ours who is a book collector, had recently shown us a facsimile edition of this book and explained its origins. I started to read the preface, and therein found interesting fodder for our continuing attempt to bring order to information.

    The origins of the collection of the Library of Congress is Thomas Jefferson’s Library, 6000 books which he offered for sale to the United States for $23,999 in the aftermath of the burning of its predecessor in the War of 1812. According to F. Forrester Church, in his forward to The Jefferson Bible, Jefferson’s scheme of classification for his books was derived from Bacon’s 1605 essay now known as The Advancement of Learning.

    In light of the schismatic nature of current American politics, it is somewhat reassuring to note that these same schisms were prominent in Jefferson’s day as well. One objection to Jefferson’s offer was made by the illiberal Massachusetts congressman Cyrus King:

    It might be inferred, from the character of the man who collected it, and France, where the collection was made, that the library contained irreligious and immoral books… in languages that many cannot read, and most ought not.

    Those pesky Frenchmen were apparently as disregardful of American sensibilities then as now.  Nice to see that the verities of history endure. But I digress.

    Jefferson’s scheme of classification was built upon the processes of mind employed upon them:

    (1) Memory, which is applied to factual data, such as “History”

    (2) Reason… which is applied to theoretical investigations, such as “Philosophy”; and

    (3) Imagination, which is applied to innocent pleasures, such as the “Fine Arts.”

    His departures from Bacon, if I read Church’s analysis correctly, had to do with Jefferson’s desire to secularize the catalog… to understand religious endeavor as subordinate to Reason, and indeed, The Jefferson Bible was his ambitious exercise in understanding the gospels themselves as a philosophical and moral system, rather than either the word of God or as a narrative of superstition.

    He did so by the method of cut and paste – literally razoring passages from four different renditions of the gospels, in Greek, Latin, French, and English, and pasting them in a blank book. Jefferson’s attempt was simply to identify the words of Jesus himself, which he judged “as distinguishable as diamonds in a dunghill”, undistorted by the misinterpretations of others (which is what Jefferson held a good portion of the gospels to be). No wonder they called it the Age of Reason.

    postscript:  There are many editions of The Jefferson Bible extant.  I had some difficulty finding a link with the exact match of the ISBN I had in hand.  The link in this post is as close as I could come (Beacon Press, 1989).  In this case, the content that I found particularly interesting was the introduction by F. Forrester Church, son of the late Frank Church, Senator of Idaho, and the afterword by Jaroslav Pelikan.  The historical context and analysis in these bookends to the actual text of Jefferson is quite interesting, and worth finding.  In the preface, Church indicates that it was a custom since 1904 to give a copy of The Jefferson Bible to each new Senator.