Peter Murray, aka the Disruptive Library Technology Jester posted an encapsulated history of the origins of the Dublin Core, and observed that he still is
trying to reconcile what differences exist between RDF and the DCAM based on these postings and comments from Stu’s blog.
I'm glad that people are engaged in trying to sort this out, even as I'm unhappy that its still unclear at this late date. That it still IS unclear is incontrovertible (look at the caliber of people trying!). I'm not very confident at this point that I can wash away the confusion, but it does seem potentially useful to reprise a part of my metadata talk that I used to give a lot.
Sharing metadata requires agreements on three topics:
- Semantics: what is the meaning we are trying to convey in metadata assertions? Meaning, of course, resides in the minds of people, not machines. The focus of the Dublin Core effort has been to promote those shared meanings... and make them sharable. The semantics bit is about agreeing about elements: author, publisher, date, etc.
- Syntax: how do you take a set of metadata assertions and pack them so that one machine can send them to another, where they can be unpacked and parsed by machine logic or displayed and read by a person with high probability that the meaning of the assertions travel unchanged from one mind to another. RDF documents refer to serialization... the order of bits in a stream... actually putting the stuff 'on the wire.' (The careful readers and jaded among you may wonder why i changed the order of exposition from the title of this post. Best for last? no... hardest.)
- Structure: You can't do syntax reliably unless you have unambiguous structure. The sorts of things you have to specify in a well-structured metadata assertion (not an exhaustive list):
- The boundaries of a set of assertions (what constitutes a record)
- Cardinality - Can an element be repeated, and if so, is there a limit on the number?
- How is a name structured? What is the delimiter separating elements of a compound name (Prince and Bono excepted, most names are compound structures, many with surprising and confounding complexity).
- How is nesting managed?
- How are dates encoded? YYYY-MM-DD? DD-MM-YYYY? MM-DD-YYYY?
- How does one identify an encoding scheme that specifies the above question?
- How does one identify a value encoding scheme (rg. LSCH, MeSH, Dewey) from which metadata values can be chosen? Are such schemes required or optional?
- Are metadata values specified by reference (URI) or by value (literal strings)?
Most of these issues are not addressed in RDF. The can be, of course... but without agreements about how to do so, people tend to do them this way and that, leaving us without the ability to share data effectively. This is where the Dublin Core Abstract Model (DCAM) comes in, as it specifies how to structure these sorts of things in a way that makes the data sharable.
Is it perfect and generalizable? No... its authors, in comments on my posts, have made evident that they make no such claim. Is it the best that is available for descriptive metadata? I assert that it is, and that efforts to work towards an Uber-Metadata-Model should start with this effort and simplify or complexify as is necessary and sufficient to assure that metadata can be shared across communities.
One last point. DCAM is articulated in the vernacular of RDF, but the structure that it creates is independent of RDF. If RDF passes into the graveyard of once-or-never-mighty technologies, the abstractions it (DCAM) declares survive quite nicely. Syntax independence: a goal we strove for from day 1 of the first DC metadata workshop. It is a worthy metadata engineering principle.
To sum up: Defining semantics is a political process of reaching consensus. Syntax is arranging the bits reliably so they travel comfortably between computers (RDF is a fine way to do this, but by no means the only way), and structure is the specification of the details necessary to layout and declare metadata assertions so they can be embedded unambiguously in a syntax. A data model is the specification of this structure.
I was influenced to include semicolons in the title of this post by an article in today's NYTs, forwarded to me by Marguerite. I LIKE semicolons, even if they are stodgy.
Wary Ibises (or something like them) in Barwon Heads, Australia