In my previous post on linked data, I made the argument that TimBL's four principles for linked data really boil down to two:
- Identify stuff with HTTP URIs
- Aggregate identified-stuff in standard [web-based] ways
There is an additional implicit requirement if linked data is to move the Web to the next level of usefulness: openness. The tantalizing payoff for linked data approaches is its use in open systems. Indeed, the term linked open data is often used to describe the idiom. So, I revise my version of the linked data principles yet again:
- Identify stuff with HTTP URIs
- Aggregate identified-stuff in standard [web-based] ways
- Make the data open and available without impediments
Web 2.0 gave us a taste for the possibilities with mashups and the proliferation of social applications. The openness that allowed mashups to capture the imagination of developers and users was largely accidental... a side effect of sites leaving the doors ajar, rather than planning for open doors. Ironically, the wild success of many social applications has substantially reduced the opportunities for mashups. The sluice gates are closing to prevent all that money from flowing downhill.
And the moniker mashup... well, it even sounds painful. Like slamming the car door on your fingers. Someone has to know the structures of the mashed-up-data, and... well... mash them together into a new structure. Bespoke interoperability. One-offs. Short-lived. Expensive.
Linked data involves designing the data beforehand for hands-free interoperability. Organic mash-up-able-ness comes from scrupulously observing principle number 2. This all sounds better and better. Do your interoperability design when you start, adopt web standards, and throw that data out there for the world to use, and the Web becomes one big mashup?
Not so fast. I'm afraid we need another principle. Data curation. Data worth linking to is cared for, managed, corrected, updated. In the words of the old song (and yes, I expose my age) "Does your chewing gum lose its flavor on the bedpost overnight?" Bit-rot prevails, as sure as death and taxes.
To recapitulate, my own list of linked data principles went from four, to three, to two, and back up to four, though slightly different than TimBL's:
- Identify stuff with HTTP URIs
- Aggregate identified-stuff in standard [web-based] ways
- Make the data explicitly open
- Eschew uncurated data
If you've rummaged around in the linked data world you've come across the Linked Open Data Cloud. A summary of currently available linked data sources has just been updated this week: State of the LOD Cloud.
There is interesting information on the state of linked data in this document, including indications that there is a growing proportion of data sets that are curated. Perhaps the best indicator of this is whether the data are published by the data producers themselves, or by third parties who found a juicy data set that they thought could be of research interest, and massaged it into usefulness. Research is good. Proof of concept is good. Neither are likely to be persistent or reliable. According to this report, only about a third of the LOD sets are currently producer-published.
There is a good deal of other data in this report worth unpacking, including metrics of conformance to a set of best practices that are emerging in the community. Perhaps the most important one of these is the availability of explicit licensing information. Without this, systems builders are writing code to shifting sands. Open today, closed tomorrow. Why would anyone take such a risk? The report indicates that fewer than 10% of the data-sets in the LOD cloud contain such licensing information. A bad sign. Lets hope it improves. It will be useful to see how these metrics change over time, but they are but indirect measures of the changing health of the LOD cloud. The proof, of course, is in the applications.
The library community is actively exploring the benefits of linked data applications as the means of providing more and better services on the Web. There is a W3C incubator group for Library Linked Data that is preparing a report on use-cases and potentially useful directions. This report will provide a milestone for the community and will, one hopes, lead to concrete steps toward a stronger presence for libraries in the linked data realm, and hence, on the Web in general. The challenges are not limited to the principles, best practices, and technical issues of linked data, however. If libraries are to succeed in this space, there are systemic community issues that must be reconciled as well. I will share my perspectives on these issues in a succeeding post.
Finally, my apologies if I have monkeyed around too much with established principles that people agree on. I do so in the interest of creating a teaching moment, and perhaps to challenge current thinking about the topic in a small way. I don't think I have a prayer of knighthood.
-----
It is my great good fortune to be in Japan on the cusp of cherry blossom season. This picture was taken in Gion district of Kyoto -- a week or two early, but with tantalizing with promise.