- From: Sean Luke <seanl@cs.umd.edu>
- Date: Thu, 23 Dec 1999 17:19:18 -0500 (EST)
- To: www-rdf-interest@w3.org
Here's the second in the RDF Improvement series :-) If you want to skip straight to the N-ary section, it's down below a bit. Again, I'm pretty sure this information is correct according to my understanding of the RDF model. But correct me if I get it wrong somewhere! RDF's Data Model ---------------- RDF certainly has an odd data model. It borrows from frame languages in that it has a set of literals, a set of resources, a set of properties, and a set of statements of the form <prop, domain, range>, where prop is a property, range is a literal or a resource, and oddly, domain is *only* a resource. Why is having this restriction strange? Obviously this form has had a long and distinguished history in semantic networks and frame languages. But these languages have also typically featured default logics, IDO, or other non-first-order logic features which make it nice to specify one of the two arguments to a binary relation to be a "slot" that is fillable only by things in an inheritance chain. But RDF doesn't have any of these features; in fact, it is directly mappable to a very simple tuple calculus. As such, the only justification I can think of for RDF having this restriction is syntax. RDF's syntax basically boils down to this: [ Object Description Area: { [Statements with the Object filling the _domain_ position]* } ]* It appears that the only way that the relation Q(x,y) can be declared in RDF is _physically_ within x's description. y cannot declare the relation. No one else can either. Only x can. Other than making the "abbreviated syntax" look pretty, I am at a loss as to why this is so. It seems arbitrary and unneccessary. One might argue that this is for "safety". That is, if the relation Q(x,y) can only be declared by x, no one can make claims for x that it doesn't agree with. But this is pretty weak. Imagine if Q is a very popular relation "husband of" in a very popular schema. Now, every person X can claim to be Madonna's lost husband, that is, HusbandOf(X,Madonna). However, Madonna apparently cannot claim that *anyone* is her husband! Not only is this mechanism not safe, it can, in easily constructed situations like the one above, actually *prevent* this notion of "safety" from ever occuring. Inverted Relations ------------------ One of the apparent results of this frame-slot model is that RDF does not have inverted relations. RDF's subPropertyOf relation allows RDF writers to "extend" properties with the following inferences (in pseudo-logic): Q(x,y) ^ subPropertyOf(R,Q) --> R(x,y). subPropertyOf(R,Q) ^ subPropertyOf(Q,S) --> subPropertyOf(R,S). The second is never expressly stated, but it seems to be inferrable from the text. Interestingly, RDF does *not* have an equivalent "inverseOf" property, something like: Q(x,y) ^ inverseOf(R,Q) --> R(y,x). inverseOf(R,Q) ^ subPropertyOf(Q,S) --> inverseOf(R,S). Of course, to hook the two, you'd need two additional inferences: subPropertyOf(R,Q) ^ inverseOf(Q,S) --> inverseOf(R,S). inverseOf(R,Q) ^ inverseOf(Q,S) --> subPropertyOf(R,S). (I think that's all of 'em...) I wonder why RDF doesn't have this? This is a very useful thing, with no additional computational complexity. It's free. It's obviously a useful tool for merging divergent schema. The only reason I can think of that RDF does not provide this is that it would allow, through inference, for Madonna to actually claim someone as her husband (through an inverted "wife of" relationship). And of course, we wouldn't want to do that! :-) :-) As a side note: Additionally, there is a constraint on subPropertyOf which says that it is invalid to have cycles in the subPropertyOf chain. This seems arbitrary to me; why can't you have cycles? The definition seems obvious: if subPropertyOf(X,Y) and subPropertyOf(Y,X), then X and Y form an equivalent set. In fact, it seems a rather nice way of creating one-to-one, onto mappings from one schema to another. You say car(X,Y), I say automobile(X,Y), and it'd be nice for my schema to map to yours saying that car(X,Y) is in fact the same basic thing as automobile(X,Y). There is a similar constraint on subClassOf which also seems arbitrary: why disallow subclass cycles? Cycles merely state that the classes are equivalent, which again can be a useful thing when merging schema. SHOE "Claims" ------------- In SHOE we also have a similar syntax model: [ Object Description Area: { [Statements that the Object is making]* } ]* Note the important difference here: Objects make claims. These claims can be anything. I can claim that Madonna and George Clinton are married. That's fine. But because of the syntax when parsing it, agents clearly understand that *I*, not Madonna or George Clinton, is making this claim. Which allows agents to take what I say with a grain of salt. SHOE thus views the relation Q(X,Y) actually as a 3-ary relation _Q(C,X,Y), where C is the _claimant_. This is read as: "C says that Q(X,Y)". In RDF, C must *always* be X, which seems vestigial at best. RDF does not even permit C to be Y, much less anything else. In SHOE, C, X, and Y are independent. N-ary Relations --------------- The biggest consequence of RDF's frame model, and one I think really needs to be addressed, is its inability to handle N-ary relations. Wow! Of course, in theory all things expressable in n-ary relations can be mapped to binary relations. But that's a little like saying that all languages are Turing-Complete as a justification for continuing to use COBOL. :-) Expressing n-ary-as-binary relations in RDF, or defining them in its schema, isn't fun. SHOE started out as a binary relation model very much like RDF. But it was after one early interested party (the CIA -- hey, sue us, we're in D.C. :-) complained about the model that we decided to move SHOE from a frame model to a n-ary relational calculus. The CIA wanted to use SHOE but to create relations that said not only that P(x,y), but that Agent 007 said that he believed P(x,y,m), where m was a certainty factor. The CIA also wanted to be able to say things like Agent 007 meets with Agent 009 on Tuesday in Prague. It seemed an obvious thing to say; unfortunately creating an intermediate object (which SHOE did, just like RDF does now) was a really ugly approach, especially since it meant that this relation was *different* from the binary relations we used (which didn't need an intermediate object). Another one: One of the odd consequences of binary relations in RDF, plus its lack of certain basic data types (like integers), is the need for "special" collection classes, with custom numbered relational values. With an n-ary approach this special case magically goes away. In RDF you attach elements to containers with a custom infinite (!) of relations, so in RDF you'd attach an element X as the first item with the relation rdf:_1(container,X). In SHOE you just make some relation, say, "contains", and write contains(container,X,1). No more need for infinite relational sets. Bag, etc. just go away. And why not? After all, since infinite relational sets aren't exactly easy to implement as tables :-), an RDF agent is probably going to implement this stuff internally as contains(container,X,1) anyway! RDF has made some n-ary stabs. In Section 7.3 for example, the RDF Model and Syntax Specification made some suggestions about how to get around this deficiency. Nonetheless, non-binary relations are guaranteed to be second-class citizens in the RDF semantics. While binary relations are first-class resources in RDF, "pseudo-n-ary" relations are odd structures which cannot be referenced by a resource. While binary relations can take advantage of subPropertyOf, pseudo-n-ary relations cannot. And reifying a binary relation is trivial. Add a single additional argument, and reification becomes a hairy mess. Lastly, mapping binary relations (using subProperty or, perhaps in the futre, inverseOf) from schema to schema is feasible. Mapping non-binary relations is presently well nigh impossible. There are syntactic inconsistencies as well. Binary relations are expressible directly in the Basic Abbreviated Syntax. That is, to say Q(X,Y) you can do either <rdf:Description about="X"> <s:Q>Y</s:Q> </rdf:Description> ...or you can do (Abbreviated) <rdf:Description about="X" s:Q="Y" /> But for an RDF-style pseudo-n-ary relation you can't fully use the Basic Abbreviated Syntax. Is that right? For a mere Q(X,Y,Z), which mapped out in RDF gets converted to Q1(X,O), Q2(O,Y), Q3(O,Z), you have to do something like: <rdf:Description about="X"> <s:Q1 s:Q2="Y" s:Q3="Z" /> </rdf:Description> Ick! It seems that in RDF, all relations are equal, but some are more equal than others. In SHOE, to declare Q(X,Y,Z) verbosely, a resource says: <relation name="Q"> <arg pos="1" val="X" /> <arg pos="2" val="Y" /> <arg pos="3" val="Z" /> </relation> Certainly a lot cleaner! If you're inside a resource X's description area (SHOE calls them "instances"), and you're just doing a binary relation Q(X,Y), and the resource in question is in the domain, an abbreviated form can look like: <relation name="Q"> <arg pos=TO val=Y /> </relation> You can do something similar if X is in the range, of course. And using something along the lines of RDF's "abbreviated" syntax, there are of course even tighter ways to describe this. You might even keep an RDF-style shorthand for binary relations (since they're so common), as long as the underlying *semantics* permit n-ary relations to be first-class citizens. To sum up: RDF's frame-based binary relation model, with the domain position hard-set by syntax to be inside a resource description, does not provide any special benefit IMHO. Certainly it does not take advantage of non-first-order inheritance or other features. There also does not appear to be any computational complexity benefit. And it does seem to have an awful lot of downsides in transparency, consistency, difficulty in manipulation, and arbitrary warts like a lack of inverse relations. Lastly, the present syntax which enforces this model is complicated for the common man to wrap his brain around. Revisiting it with a critical eye would do it some serious good. Sean
Received on Thursday, 23 December 1999 17:23:41 UTC