Erik Duval, of
Katholieke Universiteit Leuven in Belgium, is a longstanding metadata
colleague I met in the early days of Web metadata. We've worked in
the service of related activities for some years, and our paths have
intersected in productive ways on a variety of occasions in a dozen
years (Notably, here). So when he asked me to participate by teleconference in a Metadata 2.0 workshop in Leuven,
I was pleased to participate, even though it fell on the day of my two
presentations at the VALA2008 conference in Melbourne.
So, at 21:00 Melbourne time I called into the workshop (11:00 AM Leuven time) and visited my modeling ideology upon the hapless participants. Tele-presence is hard to do effectively, and especially for those who have to listen. I was afforded the dispensation of talking and going to bed soon after, so I was the lucky one! This post represents a reduction of the stock of slides I shared with the group -- hope its beef and not turkey.
The dominant issue in promoting metadata interoperability, in my estimation, lies in harmonizing data models, not element names. We in the library community have been slow to understand this, primarily because we've gotten along without a formal data model for so long.
The Dublin Core group began as a heterogeneous amalgamation of information mavens -- a good thing if you believe in hybrid vigor (I do). A bad thing from the point of view of finding common vocabularies and modeling idioms. There were (are) lots of ways to do/say/express metadata assertions, and a large proportion of them were represented among us. Attempts at abstracting a data model foundered in contentious seas of misunderstandings and egos, and any urgency about arriving at a common model gave way to simply staying afloat in troubled waters. After all, lots of people were using DC, right? The library community gets along without a rigorous data model, and MARC remains one of the most successful resource description idioms.
But MARC had only a few distinct generators (software systems}, and cross-cultural MARC dialects could be made to interoperate only with difficulty. We should have known better. Wishful thinking (and conflict avoidance) triumphed over clear reasoning, and the data modeling effort in DC came to fruition slowly, fitfully, painfully. It took a decade. That hard-won lesson, embodied in the Dublin Core Abstract Model (DCAM), remains, in my estimation, the golden nugget at the center of the Dublin Core ore.
I asserted to the Leuven group that metadata standards that don't share a common data model are doomed to perpetual lossy interoperability at best, costly bespoke mappings that never really satisfy. I've written in the past about the analogy of incompatible train gages such as are still encountered, for instance, on the China-Mongolia border. An entire train is 'unloaded' from its Chinese bogeys (wheel trucks) by being jacked up on hydraulic lifts, and Mongolian bogeys are then rolled under the carriages. amidst great clanking and hissing. The train is lowered and continues into the dark Gobi night. Is this the metadata model we want to perpetuate? Unpacking assertions in one model and repacking them into another? It is folly.
But it is still hard to find agreement in these spaces. Lessons learned unravel. There's always a higher abstraction level that will save us, no? Well, no, actually. Machine parsing requires precision. You agree about structure or you don't. I suspect that semantic interoperability decays across mappings, as with sound and light, as the cube of the 'distance' between the models. Multiplied, of course, by the sum of the metadata instances represented in each model. (How's that for an unsupportable assertion of cost?)
OK, but whose model? Did I mention there are egos involved? And money? And pride? And organizational investments? And NIH syndrome? Any one of these alone is a serious impediment to adoption.
In a further conversation with Erik, we discussed the general
suitability of the DCAM. Erik observed that the number of people, even
in technical groups, who have a strong grasp of its intricacies is
small. Unhappily, he is right. Is the DCAM needlessly complex, or is
the complexity matched to a proportionately difficult problem? And,
...should there be one model that we all build on or should we build something that overarches all existing models...
Isn't that then a common data model? If the DCAM is considered too complex, how will this help?
Answering these questions is the crux move for progress in Metadata 2.0. If the complexity is appropriate, then spare us yet another data model. If it is needlessly complex, then it behooves all parties to simplify and abstract until we have distilled the essence. Metadata 2.0 isn't social, isn't the next level, isn't the latest and greatest... its a do-over, a mulligan, an after-school detention. We just don't have it right yet. My assertion is that the DCAM is roughly right. If there be flaws, expose them with evidence. If there are better ways, demonstrate their value. Otherwise, adopt and deploy with vigor and rigor.
Get the trains rolling on the same tracks.
-----
Inside the train longhouse on the China-Mongolia border
(October 2004). Hydraulic jacks line the longhouse, and raise the
entire train, allowing one gage of bogey to be rolled out and another
set to be rolled in. The process, which includes a cabin-by-cabin
visitation by customs officials, took about two hours in the middle of
a Gobi-desert night.