My Photo

WorldCat


Twitter Updates

    follow me on Twitter
    Blog powered by TypePad

    google analytics


    meter


    Categories

    Categories

    February 19, 2008

    Uncoupling identification and resolution

    Melbourne_riverfront_night_6952 Conflating identity and resolution of Web resources is often useful... it is usually the right thing to do. But I've written in the past about the need, on occasion, to uncouple these fundamental functions.

    There is a fairly long standing and often vitriolic debate among Web technologists about whether there should be a component of web architecture that does this: identifiers that simply identify, and carry no implication of resolution.  The Just-Use-HTTP camp insists that there is no place in the naming architecture of the Web for identifiers not grounded in the HTTP protocol, even when resolution is not intended.

    Others of us have argued that persistent identifiers without a direct resolution mechanism are useful and desirable.  DOI's are a purpose-built example of this, to support the management of commercial publishing assets.  INFO URIs are intended to meet the need in other niches.  URNs were the earliest effort in this direction, though they have not been wildly successful.

    Thus, it was interesting and ironic to see a post on a W3C Team blog about  excessive traffic (100 million hits a day!) resulting from the static HTTP identifiers associated with DTDs (document type definitions) hosted by the W3C.  It is important to declare, maintain, and serve such resources, but they are not intended for routine retrieval by applications or users. Instead, such structural declarations are meant to be parsed by applications that intend to process data according to a set of declarations in the DTD, or more often, simply to confirm... 'yep... this is a document of a type known to me.'

    The use of HTTP identifiers (URLs) is an implied promise... a label that says 'there's something here for you to retrieve'.  Yes, I know about HTTP headers.  I understand that an application written to the latest protocols will understand the return codes and should take intelligent action based on those codes.  But it is laughable to expect all web applications to be well behaved in this way.  The blog post speaks of applications "creating a Distributed Denial of Service (DDoS) attack against W3C" and "abusive request patterns".  In fact, the root cause of the very real dilemma faced by the W3C and others with this problem is the ideological opposition to an alternative solution: Identifiers uncoupled from resolution. 
    -----
    The Yarra riverfront in downtown Melbourne

    February 12, 2008

    A Glimir of the Future

    Img_7107 I spoke at VALA 2008, Australia’s biennial library conference, this past week. My participation was spurred initially by a request to deliver a paper on behalf of OCLC's Robin Murray, who could not attend. Soon after I agreed, the organizers invited me to stand in for a keynote speaker who had to cancel due to a family emergency. My first ever double-substitution conference.

    The topic of the conference (Libraries: Changing Spaces, Virtual Places) gave me a perfect opportunity to combine two areas of interest – social networking and canonical identifiers – to present a case for how library systems might bring their assets into sharper Web focus.

    OCLC has been exploring an important facet of this problem (canonical manifestation identifiers), and the VALA conference afforded a timely opportunity to announce this exploration.  The tentative name for these identifiers: Global Library Manifestation Identifiers, or Glimirs for short.

    The community at large is increasingly aware of the importance of canonical identifiers for FRBR entities, especially Group I entities (Works, Expressions, Manifestations, and Items).  Existing OCLC numbers approximate manifestation identifiers, but ironically, as the database grows in scope, this rough correspondence is reduced through the loading of records in various languages.  These are not duplicates, but rather alternative institutional, regional, or language representations that point to a given resource.

    The need for explicit manifestation identifiers thus becomes more evident. We need identifiers that are globally scoped, business neutral, usable by all, and managed in either a centralized or federated manner.  To the extent that such identifiers are canonical – that is, become the dominant identifier for a given asset, they increase the “URI equity” for library assets and will strengthen the library presence on the Web.

    Interesting and challenging issues arise in the design of such identifiers and their supporting infrastructure.  Broad adoption will require a careful balance of use-cases, business issues, and community participation in meeting the need.  All of this in an environment already crowded with myriad special purpose identifiers.  OCLC is launching a pilot to explore these issues.  An early proposal has been shared with a number of technical and policy leaders and their valuable feedback will be used to strengthen the effort and move it to the next stage.

    Stay tuned.

    -----
    A surfer at Torquay beach on the coast south of Melbourne.

    July 03, 2007

    Characteristics of Names and Identifiers

    Kids_playing My colleague, Diane Vizine-Goetz, and I were ruminating about names and identifiers today, and reflecting on the daunting challenges of deconstructing descriptive practice with the idea of reconstructing them with greater effectiveness.  The library community is deeply immersed in such efforts, the hallmark of which is the Functional Requirements for Bibliographic Records, or FRBR, which has since spawned related efforts (the Functional Requirements for Authority Data, and more recently, the Functional Requirements for Subject Authority Records).

    Today's discussion had to do with modeling names and identifiers, raising the question of whether they are different sorts of entities, or rather are variations on the same theme.  Diane referred me to the FRAD report which provides for definitions and attributes of names and identifiers.

    One useful way to determine the similarity of two abstraction is to examine the attributes defined for each, and doing so in this particular case revealing:

    Names Attributes:
        type or category (personal, corporate, family, trade...)
        scope of usage
        dates of usage
        language
        character set
        transliteration scheme (for conversion across character sets)

    Identifier Attributes:
        type    (domain of authority - isbn, issn, urn...)
        string (unique identifier within a domain)
        suffix (checksum)

    So, while it is easy to conceive of names and identifiers as closely related means of reference, their attributes are distinct, and other of their functional characteristics are distinct as well.

    -----
    Image: children playing on the Mongolian steppe (2004)