RDF & DCAM: parallel or complementary?
Ed Summers posed an interesting question in reply to my assertion that the Dublin Core Abstract Model (DCAM) is the central jewel in the Dublin Core effort.
'It's funny--as a "library-technology-person" who has recently started dabbling in RDF and semweb technologies DublinCore seems pretty successful. It's a nice vocabulary to be able to invoke when describing resources, and it turns up in specs for FOAF, OAI-(ORE|PMH), RSS, Atom, RDFa, SKOS. The vocabulary I get--the DCAM is a tougher nut for me to crack. It hasn't been abundantly clear to me why it is needed when you have RDF already. I've summed it up to myself as the result of parallel evolution--but perhaps you could characterize it better. Maybe you already have? :-)'
It is always nice to see independent endorsement of the roughly-rightness of DC as a vocabulary, and I hope my earlier remarks in no way impugn the value of the global consensus these vocabulary terms represent. They are valuable to a great many, but from the first workshop almost 15 years ago, we recognized (even in the name) that DC needed to be extensible and interoperable. This is where the abstract model is important.
The evolution of RDF and DCAM are not parallel in any exclusive way, but rather intimately intertwined. Indeed, DC was the prototypical client for RDF, and DC mavens have from the beginning been an integral part of the RDF and Semantic Web development community. RDF was born at a meeting of four people (Bill Arms, then of CNRI, Jim Miller, then of the W3C, Dan Connally, then and now of the W3C, and myself, representing the DC community). The W3C folks recognized that the PICS effort then underway was inadequate to the larger needs for expressing general metadata, and thought the time was ripe for the development of something more broadly useful.
PICS (Platform for Internet Content Selection) was an effort hastily conceived to fend off assertions that porn would infect every classroom unless the gubmint stepped in to protect us. Someone (TimBL? Dan? Jim?) realized that there was benefit in building a general purpose architecture to support the declaration of reusable semantic assertions. Bill knew of this, and of the DC effort, and brought us together in a meeting at the CNRI offices in Reston, Virginia. My only contribution to the meeting was to say... 'gee, that sounds swell!' Or something like that.
So, some of the Web techies in the DC community jumped in enthusiastically and soon we had an RDF camp as an alternative to simple HTML META tag attribute-value pairs. DC fed functional requirements to the RDF folks, and we figured in a year or two the whole world would be declaring metadata using RDF. Our tender naiveté makes me laugh and shake my head now. We really thought we had this one by the tail.
It didn't quite work out that way, of course. Ten years later, and RDF still struggles in the technology marketplace (hoping for lots of shocked comments to this assertion). Why is that? Basically, because RDF fulfills a second order requirement: interoperability. It is fairly straight forward to build a closed system where everyone knows what they need. This is the way most systems used to be built, of course, and one of the wondrous things about the Web is it introduces global scope as an intrinsic technological attribute. Not to say we always take full advantage.
In the metadata realm we're trying to achieve global semantic scope as well as technological scope. And we want it to be extensible. And we hoped that applications would be built independently of one another on a technological platform that would make possible interoperability without pre-coordination. If you believe TimBL, this is the future of the Web. I've wanted to believe, and still want to. If it is to happen, it requires more than RDF. It requires conventions about how we structure our metadata assertions. This is where DCAM comes in. The abstract model provides a syntax-independent (hence the abstract bit) set of conventions for expressing metadata on the web. RDF is the natural idiom for the expression of the DCAM, but it is NOT essential. You can build any arbitrary syntactical representation of the metadata according to DCAM, and a lossless transformation to any other arbitrary syntactical representation should be possible between two machines that grok both syntaxes.
So, staying 'on the tracks' is a matter of adopting those conventions (not an intrinsic part of RDF, but naturally expressible in RDF). If you happen to be using RDF, all the better, but we make no assumptions that RDF is the only appropriate syntactic rendition. If you've reached the end of this post (I'm guessing the world-wide audience for this is post is... say... 9), and want more (we're down to 3 now), you should talk to Andy Powell or Mikael Nilsson or their co-authors, who did the heavy lifting on getting this thing done. The Metadata world owes them a substantial debt.
-----
Three dragons flying: Ok, the production values of the image in this post aren't exactly great... the iPhone will never win awards as a camera. My OCLC Programs and Research colleague, Karen Yoshimura, scribbled this out on a paper restaurant table cloth faster (and far more beautifully) than I can write my name. Man, I wish I could do that.
Also note that DCMI uses RDF Schema for declaring classes and properties. DCAM does not provide an alternative view on declaration of vocabulary, only a more complex view on descriptions.
Posted by: Mikael Nilsson | February 20, 2008 at 10:57 AM
@Jonathan: As Mikael pointed out below
http://weibel-lines.typepad.com/weibelines/2008/02/rdf-dcam-parall.html#comment-102674552
while there are similarities in what DCAM provides and what RDF/RDFS provides, I wasn't correct in saying that everything DCAM does, RDF/RDFS does, and there are concepts in DCAM for which there aren't direct analogies in RDF (probably the most important of which are those related to bounded descriptions, I think, on which the "application profile" notion depends).
Posted by: PeteJ | February 20, 2008 at 04:39 AM
PeteJ: Why should anyone who is not already involved with DCAM care at all about capturing "what DC metadata is.". Why not abandon that whole hog for RDF and associated metadata control frameworks? Why not put traditional DC metadata into an RDF control regime instead?
Stu thinks it's because DCAM gives us more "pieces" than RDF, but you disagree. It strikes me that we need to start identifying what these "pieces" are. Which in fact I think you've started to do here:
http://kmr.nada.kth.se/papers/SemanticWeb/TowardsAFramework.pdf
(That URL moved since last time I looked at it!)
Posted by: Jonathan Rochkind | February 19, 2008 at 02:20 PM
@jonathan: Heheh. Just to clarify, I was disagreeing with Stu's suggestions that the DCAM and RDF have very different roles, that the DCAM is completely independent of RDF or that the role of RDF is only(?) as a "syntax" for the DCAM.
And I was partly agreeing/empathising with you in your point that it is difficult to distinguish the roles of DCAM and RDF - but I was still arguing that the DCAM provides something which RDF doesn't :-)
Re "what things exactly you think DCAM provides that RDF doesn't", as I said in
http://weibel-lines.typepad.com/weibelines/2008/02/metadata-20-o-1.html#comment-10258
for me, the main thing the DCAM brings - as I think Mikael emphasises here too - is that it captures and formalises the DCMI community's of "what DC metadata is.
As Mikael points out, it also introduces notions of "bounded descriptions" (which in turn provides a basis for talking about structural constraints on a class of those descriptions e.g. DCMI's "Description Set Profiles"). This "boundedness" isn't present in RDF as such - though there have been efforts in that direction, see e.g.
http://www.w3.org/Submission/CBD/
and the work on named graphs
http://www.w3.org/2004/03/trix/
Posted by: PeteJ | February 19, 2008 at 01:19 AM
PeteJ: "I don't think I quite said that RDF and DCAM solve exactly the same problem."
I am still confused as to what things exactly you think DCAM provides that RDF doesn't? What is the reason to have DCAM, instead of just RDF?
Also, it would be helpful for you all to reveal: Do PeteJ and Stu _agree_ on this point, or disagree? Because if I'm reading what PeteJ and Stu are writing and they seem to contradict to me, then this makes sense if indeed PeteJ and Stu disagree. But if PeteJ and Stu think they largely agree, then one or more of PeteJ, Stu, and jrochkind are unclear on something.
Posted by: Jonathan Rochkind | February 18, 2008 at 03:22 PM
Stu, et al,
My questions and observations got to be longer than a typical comment, so I posted them at http://dltj.org/article/what-is-dublin-core/. The final paragraph is: So what is “Dublin Core”? Is it the abstract model? Is the set of terms that can be used as predicates in RDF expressions? Is it the legacy 15-element XML-based standard for describing digital objects? Count me in among those want more in trying to figure this out….
Posted by: Peter Murray | February 18, 2008 at 08:47 AM
Count me in as another of the interested. :-)
Mikael wrote: "Also, I believe DCAM is not very suitable as a fully general model for metadata.
DCAM fails miserably when applied to highly networked structures - it's too heavily optimized for "fat resources". I wouldn't use the DCAM for, say, a thesaurus."
Now, what about using it for bibliographic metadata? I've been assuming the answer is Yes.
Posted by: Irvin Flack | February 17, 2008 at 07:31 PM
Hey, sounds like everybody's having fun, maybe I should join :^).
To be honest, I don't think we have all the relationships figured out just yet. I don't think it's fair to comment on the DCAM as if it were the final step of the DC evolution - it really is an important but intermediary step in my view.
First and foremost, the DCAM fills the role of a formalization of DC principles. It's difficult to overestimate the importance of this part - the fuzziness of DC metadata has been a major roadblock for higher level achievements in the direction of application profiles (such as the Description Set Profile Model, http://dublincore.org/architecturewiki/DescriptionSetProfile).
The next step is to see how we best marry these principles with RDF and other efforts.
What the DCAM offers, that RDF does not, is a "description model": it offers a notion of "metadata records", defines boundaries between descriptions, talks about the difference between value representation and identification, defines how to reference vocabularies, and offers guidance for how to structure metadata in a resource-oriented context.
These constructs, in turn, enable the development of higher-level abstractions such as application profiles.
It's possible to build that structure on top of RDF - and I think that is the right approach, eventually, but we're not there yet. DCAM currently contains its own informal semantics, and this only makes things harder (as has been pointed out repeatedly by Alistair Miles) - better to take the semantic model from RDF, and build the description model on that. Watch for developments in this direction.
So, in short, DCAM contains parts that overlap with RDF, and parts that do not. The overlap should be eliminated, IMHO.
Also, I believe DCAM is not very suitable as a fully general model for metadata.
DCAM fails miserably when applied to highly networked structures - it's too heavily optimized for "fat resources". I wouldn't use the DCAM for, say, a thesaurus.
I think that's ok, as RDF solves the general issue anyway, but it's also important to keep that in mind.
Posted by: Mikael | February 17, 2008 at 02:53 PM
@Stu: I'm not sure I quite understand your suggestion that the DCAM brings to the table something which RDF lacks?
RDF _does_ have an "abstract model" - the RDF concepts of graph and triple perform a similar role to the DCAM concepts of description set and statement. And everything the DCAM does, RDF + RDFS does too (more or less, I think).
So I don't really see the role of RDF in relation to the DCAM as (only?) as "the natural idiom for the expression of the DCAM " or a "syntactic rendition". I recognise that historically DCMI has kinda fostered this idea by talking about "encoding in RDF" in the same breath as "encoding in XML", but (to me) those are two rather different notions.
Rather RDF and the DCAM both provide, as Mikael puts it in a recent paper [1] "Generic, framework-level models". And indeed the DCAM "delegates" some of its semantics to the RDF model, though it probably doesn't do so as clearly as it might, and there is a proposal on the table to clarify that in the future, and to formalise that relationship. See e.g.
http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0712&L=dc-architecture&P=1511
The nub of the problem is, I guess, in "how generic" DCMI sees the DCAM being, and given the very broad, diverse nature of the "DC implementer" community, I guess that isn't an easy question to answer!
So, in short, I can fully understand the confusion that Jonathan, Ed & others (e.g. this point has been raised in the various discussions around the RDA work, I think) are pointing to. Especially when the DCAM is presented away from its "historical context", if you like - as, increasingly, is the case.
(Again, all a personal view, not speaking for anyone but myself, I hasten to add!)
[1] http://ariadne.cs.kuleuven.be/lomi/images/5/52/D4.7-prolearn.pdf
Posted by: PeteJ | February 17, 2008 at 05:22 AM
@Jonathan: I don't think I quite said that RDF and DCAM solve exactly the same problem. :-) To a large extent they do, but I think they seek to address different contexts & communities, and I think that difference is significant. See my follow-up comment on the previous post
http://weibel-lines.typepad.com/weibelines/2008/02/metadata-20-o-1.html#comment-102586094
and Mikael's message here
http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0703&L=DC-ARCHITECTURE&P=R933
Posted by: PeteJ | February 17, 2008 at 04:26 AM
And, I just can't stop talking, there are more than 9 people on the code4lib irc channel alone interested in this topic. If the problems DCAM are intended to solve are real problems--why wouldn't there be LOTS of people interested? Only because they don't yet understand how DCAM could possibly help solve them, because they don't understand ANYTHING about DCAM.
If you want to make a stab at writing an essay to rectify this situation, I think the Code4Lib Journal would likely to be interested.
Posted by: Jonathan Rochkind | February 15, 2008 at 10:57 AM
Even after reading your post (which is a GREAT start), I still am not really able to answer the question:
1. What problem is RDF meant to solve?
2. What problem is DC meant to solve?
I can tell you think 1 and 2 are not the same. But apparently others intimately involved in both (Pete in a comment on your last post?) believe that 1 and 2 in fact are the same. And me and edsu, not intimately involved, remain just confused.
I undoubtedly need to read your essay a couple more times, more carefully. But, answering those questions is the kind of overview explanation that, as far as I can tell, simply doesn't exist.
Part of the problem with dealign with this stuff in general is that because it's so abstract, we do not have the shared vocabulary to talk about it. Every concept needs to be painstakingly worked out with an explicitly defined term of art. i think that's been done internal to DCAM, which is a lot of what you were saying made it a success--but it hasn't been done in a way that can be shared with newcomers yet. Or if it has, I haven't seen it.
In fact, I'd throw out the unmeasurable hypothesis that 90% of the challenge in creating a metadata regime like DCAM is just in developing a common vocabularly for a developed shared mental model. That's not only the hard part, it's almost the WHOLE part.
Posted by: Jonathan Rochkind | February 15, 2008 at 10:52 AM