My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    December 11, 2007

    Roll over, George

    Boole_0047 Jonathan Rochkind made some thoughtfully peevish comments on my previous post on RDA and the Futures report which drove me (perish the thought) back to the document itself.

    ...I'm confused by your apparent sympathy with the Working Group
    recommendations to suspend RDA work...

    ...That recommendation seems to instead be based on the fantasy that we need to spend lots of time 'testing' FRBR, at the end maybe deciding that FRBR is no good at all

    I don't see this in the recommendations at all.  What i read (in recommendation 4.2.1, p 29) is a clear mandate to resolve the existing ambiguities in the FRBR model in order to:

    provide a more robust framework for the creation of  the resource description and access rules that will be used in the future to support a broad range of searching options (also on page 29). 

    This is essential, and should be undertaken in the light of functional pragmatism, not ideology.  And certainly I agree with Jonathan that there is little time to waste.  The Futures report does not impugn the value of FRBR, but simply recognizes that we as a community do not agree about the importance of Expressions.  If it is critical in other ways, I missed it.

    There is much stronger concern expressed in the report about the uncertainties of RDA, having to do with unsubstantiated benefits, alignment with existing standards, and the business case for it (see the bottom of page 24).

    The subsequent recommendation (on the next page: 3.2.1) is stated more strongly than I might have chosen.  But the heading (Suspend Work on RDA) is elaborated with untils, and makes clear that useful work has been initiated with JSC and DCMI, and should continue.

    But any assertion that debates going on on the RDA list represent progress towards these goals is, in my view, whistling past the graveyard.

    And as for Jonathan's generous remark:

    In fact, I feel like you've expressed well the argument that I'd want
    to submit as comments to the Working Group

    I know that at least one of them reads my blog ;-)

    -----
    yes... THAT George Boole... taken in Cork, at the end of the DCC meeting on persistent identifiers in 2004

    In fact, specifically, THIS George Boole: http://worldcat.org/identities/lccn-n83-144364 (thanks, Thom)

    Thrashing in the Fields

    Morningview8514

    There are clues that tell us that a 'dialog' is out of control on a listserv.  Mine are (1) nested inclusion brackets and (2) "X wrote...y wrote" on successive lines. Recent discussions on the RDA listserv have tumbled deeply into that territory.

    My contribution to the confusion includes the following assertions:

    There is exactly one candidate for a content model that captures the relations among salient bibliographic entities that are needed to anchor library assets in the larger information sphere: FRBR.  It feels roughly right to most, though it would be unwise to underestimate the time we can (ill-afford) to spend on thrashing around in the details.

    There are, unhappily, several candidates for syntactical models (variously called, schemas, data models, and abstract models). These models are indifferent to what is encoded; rather, they define the permissible structures that can be encoded (think of sentence diagramming).

    To choose an idiom foreign to the Web for such encoding will assure the irrelevance of library data on the open Web. Recasting MARC in XML is, in my estimation, exactly such a choice.  It masquerades as Web-friendly, but the result is simply more-parseable confusion for any but cataloging geeks.

    The strongest alternative candidate is the Dublin Core Abstract Model, born of a decade of wrangling about data models in the web-metadata context.  Please do not confuse the data model with the element set.  I am not suggesting supplanting MARC cataloging with DC.

    I am asserting that embedding the library in the open Web demands:

    1. A coherent model of what we are describing and the relationships among those entities, and in which each entity is identified with a URI (FRBR, or something very like it).
    2. A carrier syntax that lives comfortably on the Web (the DC Abstract Model is my candidate)
    3. Rules for populating agreed structures (that at which RDA seems to be failing so earnestly).

    There is some urgency at agreeing on (1) and (2) before (3) can be achieved.  The recent Library of Congress Report on the Future of Bibliographic Control has committed the heresy (for some) of suggesting that RDA work be suspended and FRBR be subjected to more rigorous testing in order to increase the prospects of achieving our Web-destiny. I'm not sure I'd go that far, but I am convinced that our objectives will not be met through wrangling on mailing lists.  A coherent, well-funded community-grounded research and development program is in order.  All the innovative OPACs, Web-services, and Web-2.0 social networks will avail us not if we fail to achieve this coherence.
    -----
    DC mavens will recognize the 'sentence diagramming' metaphor as originating with Tom Baker
    -----
    An early morning view from my rooms with a view in Seattle

    May 25, 2007

    David Weinberger and Paris Hilton

    200705118865 Library Link of the Day pointed to David Weinberger's recent Tech Talk at Google, in which he flog's his latest book, Everything is Miscellaneous.  The talk is an entertaining rendition of why digitization averts the problem of trying to have a single-best-organization for the world.  The book, which I've just started, reaches deeply and none-too-gently into our notions of how the information domain is or should be structured. David dedicated the book 'to the librarians', though  librarians may be forgiven for squirming a bit, especially during his viciously funny send up of Melvil Dui...er... Dewey.

    The essence of his argument is that the transformational technology of Web 2.0 is creating an infrastructure of meaning that, in aggregate, is more about us and what we are interested in, than any authority-mediated source like a traditional newspaper or encyclopedia can possibly be.

    This isn't to say that David rejects all authority, but rather that hammering its artifacts into pre-categorized structures is inimical to its efficient utilization, and preemptive about what constitutes authority to begin with.  The distinctions between metadata and data disappear in this view -- any fragment in or about a work can be useful in retrieving that work, and the fact that they are all digital allows us to start anywhere and end anywhere.  Herman Melville can lead us to Call me Ishmael, and Call me Ishmael  can lead us to Herman Melville. The other Melvil need play no role.

    The buzz du jour around the the offices here today was Facebook's claim as a social utility -- a major primary platform for social networking (thanks to Lorcan for bringing this to my attention).  Part of the announcement included a piece about book reviews:

    For example, Facebook and Amazon.com developed a “Book Reviews” applications that lets Facebookers write and show book reviews on their profile pages, and add Amazon ‘Buy’ buttons.

    David and Facebook and Amazon all agree that book reviews are pretty important metadata.  Conversations about the uses and users of bibliographic information in our own community  evidence a rising awareness that our own metadata, designed for management and repurposed for discovery, is sorely in need of re-engineering.  We could do a lot worse than follow Weinberger's (and Facebook's and Amazon's) lead... read a book, write about it, and link to the library supply chain.  Perhaps his book dedication is more in the nature of an invocation.

    -----

    Oh... yeah... Paris Hilton.... Stars and planets, and self-rounding... watch the video.

    -----

    Image: The Southern Theater Proscenium in Columbus, Ohio, before the Eileen Ivers concert with the Promusica Chamber Orchestra (May, 2007)

    October 05, 2006

    It's the Model, Stupid

    Img_5011_1 A short history of data modeling in the Dublin Core Metadata Initiative, and what it means for the future of cataloging

    The cardinal rule at the first Dublin Core meeting way-back-when was Thou Shalt Not Conflate Syntax and Semantics. A rule honored more in the breech than in observance perhaps, but it emphasizes the importance of separating these two fundamental facets of communicating structured information. We were right about this… almost. The missing part of the picture was a sound underlying data model. 

    We knew what we were trying to say, we thought we knew how to say it. How hard is it to describe the basic characteristics of a resource, after all? Title of resource is…. Creator of resource is…. We even spoke about the initiative in terms of grammars and evolving pidgin languages (simple, emergent grammars). We hobbled along with only a vague common understanding – a model implicit in the aggregate projects using Dublin Core, propagated through imitation, nowhere formally specified.

    It isn't though we didn't try.  Early attempts to arrive at a formal data model were fraught with contention and even acrimony. Difficult meetings (Washington… Dublin… Crete), leading to small beachheads that, lacking broad consensus, soon washed away. Maybe it wasn’t so important? After all, people still used DC, we continued to attract adherents. DC was spreading… 25 languages, 50 countries. And AACR2-MARC, arguably the world’s most successful resource description standard, didn’t have a data model either… how bad a problem can it be?

    But the chickens come home to roost. The lack of a formal model led to a plethora of non-interoperable systems, crippling one of the foundation principles of the Initiative. It took 10 years for DCMI to finally evolve and adopt a formal model, and one might wonder whether simple exhaustion was a factor in its ultimate acceptance. It will be a long time before this model channels practice sufficiently to bring the many flavors of DC closer to the goal of sharing metadata across systems, let alone across various other metadata frameworks.

    The Dublin Core Abstract Model, led by Andy Powell and Pete Johnston, and later attracting the efforts of Mikael Nilsson in the cause of bridging the DCMI framework with that of IEEE LOM, is a hybrid distillation of ideas gleaned from library practice and the Semantic Web’s cornerstone technology, RDF. It reflects insights of emerging Web practice (the use of URI’s as persistent identifiers, for example), and embraces lessons learned from a decade of early metadata adopters.

    Yesterday afternoon at DC-2006, Diane Hillmann presented a summary of progress (and… surprise… contention) associated with the RDA effort, what most people understand as the international revision of AACR2. From my own uninformed perspective, it appears that this effort suffers from much the same problem that we have had in the Dublin Core – a data model implicit in years of practice and rule-revision on top of rule-revision, resulting in a focus on the minutia of rules rather than being guided by formal principles of description.

    The Web has forced us all out of isolated communities of practice and into the Internet Commons. Certainly the practice and topography of librarianship is changing out from under us. As we struggle under the stress of these changes, it is perhaps predictable that legacy systems such as cataloging practice will change even more slowly. The RDA effort recognizes the importance of updating our profession to fit more comfortably into the Internet Commons. If we are to achieve anything like the interoperability we hope for, we will need common structural models. If the effort devolves to simply unraveling existing rules and rewinding the yarns, we will fall short of the integration we need to support our future. The successes and failures of the DC community in its own modeling struggles can be useful… and reusable.  I gather that the Joint Steering Committee has sought consultation with representatives of the IEEE LOM metadata community as well as with DCMI. It would be fitting if the DCMI could return some value to the community that has provided so much of the insight that has motivated its own progress.

    Mikael Nilsson’s exhortation is on the mark. Less talk about metadata sets, and more talk about models. It is both difficult and important to get this piece right.

    ------

    Image: DC-2006 welcome reception

    September 07, 2006

    Neutrality is over-rated

    Img_3875 Traditional notions of surrogacy in the library world revolve around catalog records – a neutral distillation of attributes intended to support discovery and management. In the age of the Amazoogles, richness of linking and community-generated surrogates play a welcome role in discovery and evaluation. Several interesting issues emerge from this shift.

    Surrogates as first class objects

    I’ve alluded to the importance of this in a previous post.  It is part of the perspective shift that is, I believe, fundamental to the transition from Library 1.0 to Library 2.0 thinking.

    Neutral surrogates as opposed to evaluative surrogates

    Librarians have traditionally positioned themselves in a neutral role… above the ideological fray of content. I think this is as much artifact as intent. A bib record should be a neutral inventory of attributes, and to the extent that the catalog was central to our service, that neutrality served well. Of course, Libraries have long offered reader advisories. Nancy Pearl, Seattle's (the country’s?) best-known librarian (anyone seen a James Billington shushing Action Figure lately?) has acquired a national reputation as a voice of reading.

    We have entered the era of recommender services to assist in our every consumer selection. The best example of this in the Library  space is Amazon – the reviews are widely read and eagerly written, and the marketing data (people who bought this, also bought that…) is valuable indeed.  Library Thing has, through the application of now well-understood social collaboration techniques, has introduced a similar functionality that is independent of book purchase.

    Which is more valuable as a finding aid? A catalog record or a review? One focuses on discovery, and the other on suitability. But within the everything-indexed context of the Web, both are important, and the distinction blurs.  There is a case to be made for library-mediated evaluative surrogates coexisting cheek-by-jowl with traditional cataloging records.

    What about an Amazon-library system mashup?  Just what Amit Gupta has offered (brought to my attention in Lorcan Dempsey's Blog).  All very interesting co-evolution, but what I really wanted to do was show off the Nancy Pearl Action Figure (deluxe Version).  Small Parts. Not suitable for children under 3 or those without a library card.  Who needs Jane Austen, anyway?

    June 11, 2006

    They read the books???

    Universitywildlife_1 Morning Edition Sunday on NPR reviewed the debut novel of Keith Donohue's The Stolen Child this morning.  The book sounds engaging, a modern update of the enduring changeling theme.  What caught my interest, though, was the followup story, describing how the book reached the top 5 of the Amazon sales list, and is 26th on NYTs extended best-seller list without a single major published review.

    Amazon staff recognized the book as a strong contender (they read the books?), and decided to try a new approach to promoting it, leveraging their well-established customer-participation infrastructure.  Their new approach is described at the top of the reviews:

    We queried our top 100 reviewers as of April 6, and asked them to read The Stolen Child and share their thoughts.

    According to the NPR report, book reviews are getting less space in newspapers at the same time that the influence of amateur book reviewers is rising.  It is not lost on publishers, who apparently see the importance of high-impact reviews and promote their books' review-potential to sellers.

    Is it at all unsettling that Amazon took the initiative and recruited the attention of their top reviewers?  This isn't exactly neutral marketing.  But Amazon is upfront about what they are doing, the author benefits, Amazon benefits, the reviewers are no doubt flattered at the least.  And readers are connected with more of what they want.  Sounds good to me.

    Public bibliography is co-evolving with marketing, and while there are clearly opportunities for gaming the system and distorting the marketplace, it is just as clear that to ignore this important component of resource description is to lose traction where it counts: with readers.

    Related posts:

    First Class Objects and the Currency of Linking
    The New Cooperative Bibliography

    ----
    Image: Midday wildlife sighting in the UW University district, May 2006


    May 15, 2006

    Tag Team Wrestling

    Reed Tim "LibraryThing" Spalding has just announced a new feature on LibraryThing, currently the most impressive instance of so-called Library 2.0 that I know of.  The new feature addresses a topic that many (including myself) have been wrestling with for a while now: the relationship of formal knowledge organization systems to folksonomies.  Tim's Mother's Day blog post speaks to the raging controversy:

    Are tags better than subjects? Are subjects better than tags? Are tags just a fad? Will tags replace subjects? Are tags evil? Are subjects evil? (Believe me, the idea is out there.) Librarians have become deeply emeshed in the debate, with partisans on both sides. Until now, there hasn't been much in the way of hard data, at least for books. LibraryThing provides that.

    This last is the exciting part.  LT has provided a platform to explore the behaviors of both in a coherent, data-rich system.  Researchers, start your systems!

    -----
    image: Carolyn Dunford, of the UW MLIS program,alerted me to a lovely wildlife preserve nearly within slingshot distance of the UW (known as the Montlake FIll).  This is a closeup of one or another species of Horsetail (Thanks to Bob O'Hara for the ID), taken at the preserve. 

    May 04, 2006

    Not the end of an era

    Parkavenue Those of us with gray hair are fond of reminiscing about the cost of our first computers or how much memory we thought was impossibly more than we could ever use. I recall during my first months at OCLC that the Office of Research acquired its first 1 gigabyte disk pack… an expensive device about the size of a small refrigerator. Lots of cameras have more now, and it would be a rare automobile that does not eclipse the computing resources of the space shuttle. Now-quaint marvels such as these afford benchmarks that measure our progress along the digital byways.

    It is harder to identify with data standards such as the MARC record in the same way, especially in an age of global indexing and microformats (there may be a few of us who can remember their first 245 field, but these don’t have the same oomph in the retelling as, say, a 5 megabyte hard drive or IBM software on a cassette tape).

    The recent passing of Henriette Avram is an occasion for reflection on the importance of structured data to our community. Henriette, as architect of one of the world’s most important data standards, led a transformation of the profession of librarianship that will outlast most of us.  A large part of  every dollar I've earned in two decades comes from the industry she helped to spawn.

    Jim Gray, a Turing award winner and noted researcher for Microsoft, recently told me (on the day before Henriette’s death, as it turns out) his slides on the history of libraries in the digital age number four: they start with Alexander and the Alexandria library and the third is of Henriette and the MARC record.

    Thank you, Henriette.
    -----
    Image: Park Avenue in New York City on a beautiful day in April, 2006

    Post Script:  Walt Crawford, my-soon-to-be-more-closely-related colleague, caught me out in a goof, which I fixed, and if you didn't find it, tough.  Check the Internet Archive.  I'm not a real librarian... I admit it. But I'm married to one!  She's not a cataloger either. (thanks, Walt!)

    April 13, 2006

    Cheep Links

    Gullface_1 Geoff Froh, one of the MSIM graduate students who has befriended me during my stay here at the UW iSchool, sent me the following link from last September that has a lot of great ideas about creating a richer web of information -- cheaply:

    Using Wikipedia and the Yahoo API to give structure to flat lists

    Hackdiary is Matt Biddulph’s idea-rich blog of his travels through Web 2.0.

     

    The post that motivated this entry is a succinct description of some of Matt’s work at the BBC, but it transfers pretty much wholesale to what we should be doing more of with library data:

    "adding value to your own data by using external information"

    Biddulph’s post emphasizes some of the benefits of the open Web that are available to anyone with a creative vision of how to capitalize on them to create more value at low cost.

    One of my early blog entries was about the addition of name authority data into the German Wikipedia, and I seem to be bouncing back again and again to the concept of public bibliography – creating rich semantic linkages around traditionally-described formal resources that gives them context and heightens their visibility in Web space.   Using the open Web to enrich our assets while in turn contributing to the information assets of the Commons embodies the reciprocal creation of value that fits neatly within our tradition and is, I believe, critical to our future.

    -----

    image: Gulls ride for free on the Washington State Ferries.   And they aren't bashful.  March, 2006, near Orcas Island in the San Juans

    February 08, 2006

    The New Cooperative Cataloging

    Cannonpen_1 Readers of an earlier post on my blog will have seen that I dived into the Open WorldCat Not-Quite-A-Wiki Book Review Cache for the first time as a reviewer this week. I also posted a note on why it is critical for libraries to participate more fully in the Linking Economy [First Class Objects and the Currency of Linking]. Not just important, but critical.

    So, I did one myself… gotta eat your own dog food, eh? OK, well, you have to establish an account (user name, password, that stuff).  Another password to manage… argh. But still… it’s important, so I did it.

    Then cometh the dreaded TERMS & CONDITIONS, in which I learn, among other things, that by contributing content, I agree to defend OCLC in any legal disputes arising from my content.  My immediate  vision is of Lil’ Ole Me… armed with naught but fountain pen against the onslaught of Smaug and his law clerks, defending the Mothership from harm (sorry about the mixed allusions... too many allegories lately).

    Well, that made me swallow hard. Visions of legal fees withheld from my salary for the rest of my unnatural life. How in heaven’s name are we going to get people to contribute time and effort in the service of the Bibliographic Cooperative if we indenture them legally in the process? Well, my immediate response was to whine about it (those who know me will not find this startling).  One reply to my whine (by a colleague with legal training who does not work for OCLC) went something like: ‘well, of course OCLC has to do this. Look at the international cartoon crisis!’

    Hmmmm... that brought me up short.

    OK… the risks are not zero. But risk aversion threatens to obscure the values we stand for. Some of those risks, of course, are real; others have the character of being probabilistically small but of horrendous consequences. Mostly it’s impossible to tell ahead of time.  Therein lies the rub.

    The biggest risk of all, for our profession, is not to be at the table, not to stand up for what we believe, not to bring our professional values to bear in the most effective manner possible. In the present case, that means enriching the public link structure of bibliography.

    My exhortations:
    To OCLC:

    • Make WorldCat Wiki reviews link-addressable and harvestable
    • Perhaps use Creative Commons licenses to establish the content as solidly part of the intellectual commons that is our foundation (we’ve done this in the past with WebJunction, as I learned yesterday in my visit to the WebJunction offices in Seattle).

    To Librarians:

    • Let nothing you read go un-reviewed.  Don’t feel you are competent to review your physics library’s latest treatise on string theory? Take responsibility for recruiting your patrons to do it.
    • We have the scarcest asset on the Web – Public Trust. Advise your users. Make the creation of linkable advisory content a personal priority. 100,000 librarians x 1 review per week x 50 weeks = a pretty good start.
    • Take the risk.

    To Libraries:

    • Recognize and promote this effort. Provide incentives to staff. This is the New Cooperative Cataloging for the Web.

    A note about the image in this post.  It is from a larger work by Falah Shwan, an Iraqi refugee living in Columbus.  I'm not really sure about my reproduction rights -- Marguerite and I purchased the work from which it is excerpted, and it has occurred to me that it is a perfect logo for the Radical Militant Libraian T-shirts that we all should have ;-).  If there is interest, we will try to track down the artist and see about licensing it.