I don't recall the date, or even the year, but the moment I saw a URL emblazoned on the side of an eighteen wheel truck was an epiphany. The Web was no longer a geeky idea belonging to a now-beknighted mathematician at CERN, but rather a tool of commerce and a vehicle of innovation. Nowdays, it is hard to find a truck without a URL on it, and indeed, the trucks themselves often have their own web identities (embedded in RFID chips that allow automated tracking of the trucks and their contents so as to streamline supply chains).
Linked data is the Next Big Thing among webulationists, but its really the last big thing recast. TimBL's principles of linked data (shamelessly cut and pasted from Wikipedia) look an awful lot like... well... the Web as we have known it from the start:
- Use URIs to identify things.
- Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.
- Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.
- Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
Ok, first off, 1 & 2 are redundent. So, there are really only three principles.
- Identify stuff (with globally unique identifiers in the Web namespace).
- Associate information with the identifiers in standardized ways
- Link to other useful stuff.
Yikes... there are really only two principles. Number 4 just means that the 'associated information' also has web identifiers. As was true when the web started, various entities in a given [assertion|page|dataset] are expressed with web identifiers that allow a user or application to navigate a web of links:
- Identify stuff with HTTP URIs
- Aggregate identified-stuff in standard ways
As Maimonides said... All else is commentary. At least, I think it was Maimonides. Anyway... when I started this post, it was not my intention to simplify Sir TimBL's elegant representation of global knowledge. Rather, I was pondering the ubiquity of our shared namespace, and its implications for libraries. And, parenthetically, why libraries, august guardians of the world's knowledge for centuries, are so poorly represented on the Web. As a community, libraries practically invented principle number 2. Henriette Avram (and her colleagues) wrote the book... literally... on aggregating bibliographic descriptions in standard ways.
That was then, and this is now, and libraries still, largely, use that same book (MARC). Its a bit dog-eared, and not well-suited to a world where trucks are known by their URLs. But that isn't why libraries have such a dismal presence on the web. Its the identifiers. Too many of them.
Search Engine Optimization (Google-juice, for the irreverant among us) is largely about concentrating the web identity of ideas, descriptions, or things under a single identifier. Two identifiers rather than one? Google-juice is diluted by half (assuming equal distribution of identifiers). Three identifiers? Four? Pretty soon, you have that warm bucket of spit that Harry Truman popularized.
Can this be fixed? Stay tuned for the next episode (but you know the answer already).
-----
The National Diet Library, Japan's national library, has three facilities. This striking entry signage is for the International Library of Children's Literature, one of the three. The Children's Literature library is located adjacent to the Tokyo National Museum, adjacent to Ueno Park. The main NDL facility is also in Tokyo, and the third, the Kansai Kan facility, is south of Kyoto, near Nara (more on this as I catch up a bit).
-----
Thank you to Matthew Beacom for catching an error in the first formulation of this post. I'm too embarrassed to admit what it was, having written this be-nighted (be-tween 2 and 4 in the be-ginning of the day, and may have been be-dazzled by the tiniest touch of the Jack Daniels that be by my bedstand (for the medicinal [and apparently unsuccessful] treatment of insomnia).