I gave a keynote at the North Atlantic Health Sciences Library meeting (NAHSL 2009) this week, entitled Semantic Web Technologies: Changing Bibliographic Futures? My introductory tour of semantic web technology and its application to libraries. It is no small challenge to summarize an acronym-rich and rapidly changing domain such as the semantic web, and fit it into the context of a particular domain such as librarianship, and end up making sense. Others may better judge my success.
My single slide on OWL, the Web ontology language that has been approved as a W3C standard, was, I'm afraid, somewhat dismissive of the real world benefits of the use of ontologies. I made what I believe to be an entirely defensible assertion that ontologies decline in usefulness rapidly as the scope of their application increases. To put it differently, ontologies may be useful in a constrained domain where a community has a well-structured vocabulary that is precisely specified. Their usefulness as a means to model (and interrogate) large, cross domain information models is less obvious.
The keynote immediately following my own was given by Judith Blake, of the Jackson Laboratory in Bar Harbor, ME. She is a leader in the Gene Ontology project, which is
a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.
Judith's talk was an impressive explication of a community-based effort to bring coherence to a large body of data and publications in the genomic information domain, and make the results open and web-accessible. Worth noting, that this is accomplished only through ongoing maintenance of the ontologies, and extensive human-mediated tagging of information assets 'out there' in the wild. A costly, ongoing endeavor that reinforces the notion that there is often no simple, automated path to embedding semantics on the Web.
The mantra of many of those in the linked data community is simply free the data -- get it out there, and people will figure out what to do with it. There is a great deal of truth in this, but also a long term trap. Data is not always static and immutable. It often requires ongoing maintenance -- correction of errors, updating, enriching. The long term utility of linked data will depend in part on the quality of the data, and even using the latest and greatest tools of the coolest technologies available, the benefit of the data is only as good as the effort invested in maintaining it.
-----
The Maine coast at the Samoset Resort, site of NAHSL 2009