Mortimer Adler argued that civil discourse requires first and foremost that one must have a clear notion of what one's fellow dialecticians are actually saying. That is, one should begin every discussion with the question "Do I understand you to say...?" In this spirit, I trust that others, or perhaps even Norman himself, might point out misinterpretations I have made in my reading of his post Names and Addresses. My paraphrases correspond to the major headings in his post.
"They [identifiers] are just strings"
I agree with this point. If you ignore the semantics (or implied semantics) of a name, it is just a collection of characters with parsing rules that insure a globally unique string. The http:// is, for ID purposes, largely irrelevant and the Domain Name Service (DNS) provides a wonderfully effective means of providing for...
Distributed Naming
The DNS system has proven to be robust and stable, and an effective means for distributing local naming authority in a globally distributed way. That is, every domain owner has the authority and means to assign names within a particular domain (namespace), or even subdivide that authority into smaller namespaces.
Norman evinces a confidence in the persistence of DNS name management as likely to outlast any new organization created to manage a newly created URI namespace (referred to in his post as newscheme:. This is a reasonable bet if indeed a new organization were created solely for such purpose. In fact, it is more likely that such functions will be managed within existing stable organizations (my own thoughts naturally run towards libraries and their sound reputation for curating information for the long haul). I suspect Norman was thinking about the DOI Foundation, host of the DOI namespace, and which emerged in response to the interests of commercial publishers in an autonomous identifier assignment entity.
Globally unique (unambiguous) names are important
The DNS, again, assures this global uniqueness. But, it is within the purview of the local name authority (domain owner) to reassign names and their corresponding referents as it sees fit. That is entirely appropriate in some cases, and not so in others. Consider, for example, the following two URIs:
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
At the moment that I am writing this sentence, these two identifiers map to the same resource. At some point in the future, the second one (the persistent identifier associated with the latest version) will map to another, more polished version, while the former identifier will remain associated with the current version as mandated by the policy of the resource curator (the W3C in this case).
For most scholarly assets, it is important that the relation of an identifier to its referent be invariant. Norman correctly points out that this is a "matter of diligence and trust," a social issue. The example of the W3C URIs points out that the trust is invested in a particular policy for maintaining a chain of evidence (in this case, a series of versions). It is not always simple to achieve this, and often will require close curatorial attention (the diligence part).
Persistence is important
The essence of this issue is also, as Norman states, social. There is no technical assurance of persistence for either the names or the resources they identify. The only guarantee of persistence is the commitment of the organizations with curatorial responsibility for them.
Many of the early URLs identifying the beginnings of the Web have long since broken. CERN, the birthplace of the Web, decided it was about doing and curating physics, not Web technology. In those breathless early years, as custody was transferred, the identifiers did not always survive. It is perhaps the case that some of the documents did not as well... I don't know.
It has been argued that the http component of URIs is a weak link, as protocols do not last forever. This argument is spurious, both because of the "It's just a string" argument, and because the world is so deeply dependant on this infrastructure at this point that anything that succeeds http will necessarily be backward-compatible. Once again, we agree.
Resolvable Identifiers -- HTTP is a winner
I prefer to use the terminology of resolvable rather than retrievable as Norman does, as I think it better captures the notion of mapping of identifiers and resources. Norman's point that http is a clear winner is true in one sense, but may mislead in another. Our first point of disagreement, I think.
I would not argue that there is a better alternative than the http protocol for retrieval purposes on Planet Web. I do argue, however, that there are circumstances in which resolution should be explicitly uncoupled from identity. I will explicate some of these circumstances in a subsequent post.
I believe that Norman and others will counter with the "It's only a string" argument, and from a technical perspective, this is exactly correct. There is no technical requirement that an http:URI must be resolvable. It can act as a globally-unique string that maps conceptually to a real or abstract asset, whether or not an http server ever acts on it.
My objection to this approach goes back to the outlandish success of the Web, and to the implicit social contract of resolvability. You can correctly assert that http is just a substring and carries no promise, but not if you live on Planet Semantics as well as Planet Web. Such URIs will be recognized by machines and resolved (or at least resolution will be attempted) and they will be recognized by people, who will expect something to be at the end of the link. To the extent that that something is other than what they expect, unwelcome surprise results. The overloading of resolution and identity is not only benign, but advantageous in most circumstances. But not all.
The question as I see it is, is there sufficient value in the use of some newscheme: pure identifier to justify the effort in establishing a separate identifier protocol, maintaining an appropriate registry, and supporting its use? Which brings us to Norman's last objection:
Paying for names
Norman asks "Why pay for something new when I've already got what I want?" A good question, and for the great majority of assigners of identifiers, there is no need. Names, even DNS-based names, are not free. We pay for domains, annually renewable. We pay (largely hidden) costs of assigning and maintaining the integrity of the identifier mappings under our authority. As far as I am aware, there are no examples of any naming systems that require payment for the end-use of the names (resolving them), though at least one (DOIs) require a fee for issuance or assignment to resources.
Why would you pay for such identifiers if their equivalent is available in the technology (already paid for) at hand? In the case of DOIs, it is presumably because their constituency (largely commercial publishers) finds their use productive in the context of their business model.
In summary:
- I agree almost entirely with the substance of Norman's arguments about the suitability of HTTP:URIs as a substrate for persistent, globally-unique identifiers to support resolution.
- We disagree, I think, on the promise of resolution implied by the http protocol token, and whether or not this has practical importance.
- Many adherents to the "just use HTTP" argument reject the argument that it can be useful to uncouple identity and resolution, and with it, the assertion that http identifiers may sometimes be less desirable than 'pure' identifier alternatives.
- Departing from the world's more widely-deployed identifier system (http:URIs) involves both costs and vulnerabilities. Any effective alternative must offset these disadvantages through significant added value.
I will elaborate on some of these points of difference in a subsequent post.
POSTSCRIPT:
As I was proofing this post after I published it, I found that one of the two URIs supposedly identifying the W3C Tag finding does not actually work. The "latest version" link works, and that is the most important one, but the time-stamped version does not. This is an excellent example of the difficulty that every organization has in actually meeting its responsibilities of "diligence and trust". In my experience, the W3C takes these responsibilities seriously.
Is such an issue a problem? After all, the latest version link is the important one, no? It is, but if you're interested in evidence chains in scholarship, then all the links are important, and as this example illustrates, they can be fragile. In a subsequent post, I'll return to this issue and illustrate why both of these links are important to me personally.
-----
Image: Ship's prow, Tacoma Harbor, taken from Highlander, August 27