A short history of data modeling in the Dublin Core Metadata
Initiative, and what it means for the future of cataloging
The cardinal rule at the first Dublin Core meeting
way-back-when was Thou Shalt Not Conflate Syntax and Semantics. A rule honored more in the breech than in
observance perhaps, but it emphasizes the importance of separating these two
fundamental facets of communicating structured information. We were right about this… almost. The missing part of the picture was a sound
underlying data model.
We knew what we were trying to say, we thought we knew how
to say it. How hard is it to describe
the basic characteristics of a resource, after all? Title of resource is…. Creator of resource is…. We even spoke about the initiative in terms
of grammars and evolving pidgin languages (simple, emergent grammars). We hobbled along with only a vague common understanding – a model implicit in
the aggregate projects using Dublin Core, propagated through imitation, nowhere
formally specified.
It isn't though we didn't try. Early attempts to arrive at a formal data model were fraught
with contention and even acrimony. Difficult
meetings (Washington… Dublin…
Crete), leading to small beachheads that,
lacking broad consensus, soon washed away. Maybe it wasn’t so important? After all, people still used DC, we continued to attract adherents. DC was spreading… 25 languages, 50
countries. And AACR2-MARC, arguably the
world’s most successful resource description standard, didn’t have a data model
either… how bad a problem can it be?
But the chickens come home to roost. The lack of a formal model led to a plethora
of non-interoperable systems, crippling one of the foundation principles of the
Initiative. It took 10 years for DCMI to
finally evolve and adopt a formal model, and one might wonder whether simple
exhaustion was a factor in its ultimate acceptance. It will be a long time before this model
channels practice sufficiently to bring the many flavors of DC closer to the
goal of sharing metadata across systems, let alone across various other
metadata frameworks.
The Dublin Core Abstract Model, led by Andy Powell and Pete
Johnston, and later attracting the efforts of Mikael Nilsson in the cause of
bridging the DCMI framework with that of IEEE LOM, is a hybrid distillation of
ideas gleaned from library practice and the Semantic Web’s cornerstone
technology, RDF. It reflects insights of
emerging Web practice (the use of URI’s as persistent identifiers, for
example), and embraces lessons learned from a decade of early metadata
adopters.
Yesterday afternoon at DC-2006, Diane Hillmann presented a
summary of progress (and… surprise… contention) associated with the RDA effort,
what most people understand as the international revision of AACR2. From my own uninformed perspective, it
appears that this effort suffers from much the same problem that we have had in
the Dublin Core – a data model implicit in years of practice and rule-revision
on top of rule-revision, resulting in a focus on the minutia of rules rather
than being guided by formal principles of description.
The Web has forced us all out of isolated communities of
practice and into the Internet Commons. Certainly the practice and topography of librarianship is changing out
from under us. As we struggle under the
stress of these changes, it is perhaps predictable that legacy systems such as
cataloging practice will change even more slowly. The RDA effort recognizes the importance of
updating our profession to fit more comfortably into the Internet Commons. If we are to achieve anything like the
interoperability we hope for, we will need common structural models. If the
effort devolves to simply unraveling existing rules and rewinding the yarns, we
will fall short of the integration we need to support our future. The successes and failures of the DC
community in its own modeling struggles can be useful… and reusable. I gather that the Joint Steering Committee has
sought consultation with representatives of the IEEE LOM metadata community as
well as with DCMI. It would be fitting
if the DCMI could return some value to the community that has provided so much
of the insight that has motivated its own progress.
Mikael Nilsson’s exhortation is on the mark. Less talk about metadata sets, and more talk
about models. It is both difficult and important to get
this piece right.
------
Image: DC-2006 welcome reception