N-ary relations, Claims vs Facts, and RDF's odd frame model from Sean Luke on 1999-12-23 (www-rdf-interest@w3.org from December 1999)

From: Sean Luke <seanl@cs.umd.edu>
Date: Thu, 23 Dec 1999 17:19:18 -0500 (EST)
To: www-rdf-interest@w3.org
Message-ID: <Pine.SO4.4.05.9912231429150.11252-100000@jifsan.cs.umd.edu>
Here's the second in the RDF Improvement series :-)  If you want to skip
straight to the N-ary section, it's down below a bit.  Again, I'm pretty
sure this information is correct according to my understanding of the RDF
model.  But correct me if I get it wrong somewhere!


RDF's Data Model
----------------

RDF certainly has an odd data model.  It borrows from frame languages in
that it has a set of literals, a set of resources, a set of properties,
and a set of statements of the form <prop, domain, range>, where prop is a
property, range is a literal or a resource, and oddly, domain is *only* a
resource.  Why is having this restriction strange?  Obviously this form
has had a long and distinguished history in semantic networks and frame
languages.  But these languages have also typically featured default
logics, IDO, or other non-first-order logic features which make it nice to
specify one of the two arguments to a binary relation to be a "slot" that
is fillable only by things in an inheritance chain.  But RDF doesn't have
any of these features; in fact, it is directly mappable to a very simple
tuple calculus.  As such, the only justification I can think of for RDF
having this restriction is syntax.

RDF's syntax basically boils down to this:

	[
	Object Description Area:
		{
		[Statements with the Object filling the _domain_ position]*
		}
	]*

It appears that the only way that the relation Q(x,y) can be declared in
RDF is _physically_ within x's description. y cannot declare the relation.  
No one else can either.  Only x can.  Other than making the "abbreviated
syntax" look pretty, I am at a loss as to why this is so.  It seems
arbitrary and unneccessary.

One might argue that this is for "safety".  That is, if the relation
Q(x,y) can only be declared by x, no one can make claims for x that it
doesn't agree with.  But this is pretty weak.  Imagine if Q is a very
popular relation "husband of" in a very popular schema.  Now, every person
X can claim to be Madonna's lost husband, that is, HusbandOf(X,Madonna).
However, Madonna apparently cannot claim that *anyone* is her husband! Not
only is this mechanism not safe, it can, in easily constructed situations
like the one above, actually *prevent* this notion of "safety" from ever
occuring.

Inverted Relations
------------------

One of the apparent results of this frame-slot model is that RDF does not
have inverted relations.  RDF's subPropertyOf relation allows RDF writers
to "extend" properties with the following inferences (in pseudo-logic):

	Q(x,y) ^ subPropertyOf(R,Q) --> R(x,y).
	subPropertyOf(R,Q) ^ subPropertyOf(Q,S) --> subPropertyOf(R,S).

The second is never expressly stated, but it seems to be inferrable from
the text. Interestingly, RDF does *not* have an equivalent "inverseOf"
property, something like:

	Q(x,y) ^ inverseOf(R,Q) --> R(y,x).
	inverseOf(R,Q) ^ subPropertyOf(Q,S) --> inverseOf(R,S).

Of course, to hook the two, you'd need two additional inferences:

	subPropertyOf(R,Q) ^ inverseOf(Q,S) --> inverseOf(R,S).
	inverseOf(R,Q) ^ inverseOf(Q,S) --> subPropertyOf(R,S).
		(I think that's all of 'em...)

I wonder why RDF doesn't have this?  This is a very useful thing, with no
additional computational complexity.  It's free.  It's obviously a useful
tool for merging divergent schema.  The only reason I can think of that
RDF does not provide this is that it would allow, through inference, for
Madonna to actually claim someone as her husband (through an inverted
"wife of" relationship).  And of course, we wouldn't want to do that! :-)
:-)

As a side note: Additionally, there is a constraint on subPropertyOf which
says that it is invalid to have cycles in the subPropertyOf chain.  This
seems arbitrary to me; why can't you have cycles?  The definition seems
obvious: if subPropertyOf(X,Y) and subPropertyOf(Y,X), then X and Y form
an equivalent set.  In fact, it seems a rather nice way of creating
one-to-one, onto mappings from one schema to another.  You say car(X,Y), I
say automobile(X,Y), and it'd be nice for my schema to map to yours saying
that car(X,Y) is in fact the same basic thing as automobile(X,Y). There is
a similar constraint on subClassOf which also seems arbitrary: why
disallow subclass cycles?  Cycles merely state that the classes are
equivalent, which again can be a useful thing when merging schema.


SHOE "Claims"
-------------

In SHOE we also have a similar syntax model:

        [
        Object Description Area:
                {
                [Statements that the Object is making]*
                }
        ]*

Note the important difference here: Objects make claims.  These claims can
be anything.  I can claim that Madonna and George Clinton are married.
That's fine.  But because of the syntax when parsing it, agents clearly
understand that *I*, not Madonna or George Clinton, is making this claim.
Which allows agents to take what I say with a grain of salt.

SHOE thus views the relation Q(X,Y) actually as a 3-ary relation
_Q(C,X,Y), where C is the _claimant_.  This is read as: "C says that
Q(X,Y)".  In RDF, C must *always* be X, which seems vestigial at best.  
RDF does not even permit C to be Y, much less anything else.  In SHOE, C,
X, and Y are independent.


N-ary Relations
---------------

The biggest consequence of RDF's frame model, and one I think really needs
to be addressed, is its inability to handle N-ary relations. Wow! Of
course, in theory all things expressable in n-ary relations can be mapped
to binary relations.  But that's a little like saying that all languages
are Turing-Complete as a justification for continuing to use COBOL.  :-)
Expressing n-ary-as-binary relations in RDF, or defining them in its
schema, isn't fun.

SHOE started out as a binary relation model very much like RDF.  But it
was after one early interested party (the CIA -- hey, sue us, we're in
D.C. :-) complained about the model that we decided to move SHOE from a
frame model to a n-ary relational calculus.  The CIA wanted to use SHOE
but to create relations that said not only that P(x,y), but that Agent 007
said that he believed P(x,y,m), where m was a certainty factor.  The CIA
also wanted to be able to say things like Agent 007 meets with Agent 009
on Tuesday in Prague.  It seemed an obvious thing to say; unfortunately
creating an intermediate object (which SHOE did, just like RDF does now)
was a really ugly approach, especially since it meant that this relation
was *different* from the binary relations we used (which didn't need an
intermediate object).

Another one: One of the odd consequences of binary relations in RDF, plus
its lack of certain basic data types (like integers), is the need for
"special" collection classes, with custom numbered relational values. With
an n-ary approach this special case magically goes away.  In RDF you
attach elements to containers with a custom infinite (!) of relations, so
in RDF you'd attach an element X as the first item with the relation
rdf:_1(container,X).  In SHOE you just make some relation, say,
"contains", and write contains(container,X,1).  No more need for infinite
relational sets.  Bag, etc. just go away.  And why not?  After all, since
infinite relational sets aren't exactly easy to implement as tables :-),
an RDF agent is probably going to implement this stuff internally as
contains(container,X,1) anyway!

RDF has made some n-ary stabs.  In Section 7.3 for example, the RDF Model
and Syntax Specification made some suggestions about how to get around
this deficiency.  Nonetheless, non-binary relations are guaranteed to be
second-class citizens in the RDF semantics. While binary relations are
first-class resources in RDF, "pseudo-n-ary" relations are odd structures
which cannot be referenced by a resource. While binary relations can take
advantage of subPropertyOf, pseudo-n-ary relations cannot.  And reifying a
binary relation is trivial. Add a single additional argument, and
reification becomes a hairy mess. Lastly, mapping binary relations (using
subProperty or, perhaps in the futre, inverseOf) from schema to schema is
feasible.  Mapping non-binary relations is presently well nigh impossible.

There are syntactic inconsistencies as well.  Binary relations are
expressible directly in the Basic Abbreviated Syntax.  That is, to say
Q(X,Y) you can do either

	<rdf:Description about="X"> <s:Q>Y</s:Q> </rdf:Description>

...or you can do (Abbreviated)

	<rdf:Description about="X" s:Q="Y" />

But for an RDF-style pseudo-n-ary relation you can't fully use the Basic
Abbreviated Syntax.  Is that right?  For a mere Q(X,Y,Z), which mapped out
in RDF gets converted to Q1(X,O), Q2(O,Y), Q3(O,Z), you have to do
something like:

	<rdf:Description about="X"> <s:Q1 s:Q2="Y" s:Q3="Z" />
	</rdf:Description>

Ick!  It seems that in RDF, all relations are equal, but some are more
equal than others.

In SHOE, to declare Q(X,Y,Z) verbosely, a resource says:

	<relation name="Q">
		<arg pos="1" val="X" />
		<arg pos="2" val="Y" />
		<arg pos="3" val="Z" />
	</relation>

Certainly a lot cleaner!  If you're inside a resource X's description area
(SHOE calls them "instances"), and you're just doing a binary relation
Q(X,Y), and the resource in question is in the domain, an abbreviated form
can look like:

	<relation name="Q"> <arg pos=TO val=Y /> </relation>

You can do something similar if X is in the range, of course.  And using
something along the lines of RDF's "abbreviated" syntax, there are of
course even tighter ways to describe this.  You might even keep an
RDF-style shorthand for binary relations (since they're so common), as
long as the underlying *semantics* permit n-ary relations to be
first-class citizens.

To sum up: RDF's frame-based binary relation model, with the domain
position hard-set by syntax to be inside a resource description, does not
provide any special benefit IMHO.  Certainly it does not take advantage of
non-first-order inheritance or other features.  There also does not
appear to be any computational complexity benefit.  And it does seem to
have an awful lot of downsides in transparency, consistency, difficulty in
manipulation, and arbitrary warts like a lack of inverse relations.
Lastly, the present syntax which enforces this model is complicated for
the common man to wrap his brain around.  Revisiting it with a critical
eye would do it some serious good.

Sean
Received on Thursday, 23 December 1999 17:23:41 UTC