N-ary relations, Claims vs Facts, and RDF's odd frame model from Stefan Decker on 1999-12-24 (www-rdf-interest@w3.org from December 1999)

From: Stefan Decker <stefan@DB.Stanford.EDU>
Date: Thu, 23 Dec 1999 17:19:46 -0800
To: www-rdf-interest@w3.org, seanl@cs.umd.edu
Message-Id: <4.2.0.58.19991223164738.015c8510@db.stanford.edu>
Hi Sean,

at first its nice to read an RDF-related email from you!

However, at first something else:
What i still miss is a clear statement from W3C people (maybe even
Tim himself), how the suggestions from the mailing list will be
feed back into RDF 2.0 ?
Setting up a new Working group?
Or would it be fine to generate a living document out of the
discussions on the mailing list?
It would be a pity, if the discussions and results are lost...

But now to your mails:

 >Inferences and RDF
 >------------------
 >
 >RDF's basic semantic design philosophy is different from SHOE's.
 >RDF seems to follow in the footsteps of frame languages and
 >semantic networks; SHOE used to be a semantic network as well, but
 >now tends to follow relational and logical languages: it provides
 >n-ary relations and a general inference mechanism.
 >
 >Many of RDF's bigger special-case warts (hard-coded "container"
 >objects, infinite sets of relations for numbering in containers,
 >"aboutEach", "subProperty", etc.) stem from its lack of any basic
 >general-purpose inferential rule mechanism.  With even a very
 >simplistic Horn clause mechanism, *all* of RDF's present
 >special-case semantics (except its aboutEachPrefix feature) can be
 >dealt with trivially.
Sure. And there are some approaches to enhance RDF with inference
capabilities. As Sergey as already pointed out, the idea is to
built inference languages on top of RDF. Metalog from W3C is one
example, although i doubt that Prolog-top down evaluation is
suitable for the Web. Sergey has built an Inference Layer on top
of RDF for SiLRI, translation RDF descriptions in general logic
programs, and i am aware of at least another approach defining a
description logic language on top of RDF.

So, an important next step is to fix a syntax and semantics for an
RDF inference service. My suggestion is Horn-Logic based plus
well-founded negation.

 >So why is this a problem?  I'd be willing to bet some serious
 >money that as RDF progresses, the pressure to add more and more
 >special-case warts to the language will grow, as users rapidly
 >outgrow the present finite semantic usefulness of a language with
 >no general-purpose inferential mechanism.  RDF's semantics are
 >presently pretty watered down, so much so that it is difficult
 >(not impossible) to coherently argue why using RDF is better than
 >just writing an application in XML for your schema and going with
 >that.
RDF is only a slight, but important step, on top of XML. It just
fixes the object model. The next step is to define more expressive
languages on top of that.

 >SHOE's present semantics are equivalent to Datalog without
 >negation (not even stratified negation).
Stratified negation is not suitable, because with one predicate
(e.g. triple) everything is non stratified immediately. Stable
model semantics is to expensive (exponential), but well founded
model semantics handle this case very well (polynomial)

 >No negation, no
 >procedural attachment, no numerical operations, and only a single
 >element in the consequent of the Horn clause.

There is no extra cost for adding additional elements...
Your example is expressible with a language on top of RDF.


 >The critical issue is of course how to implement such semantics
 >without getting bogged down in a morass of computational
 >complexity a-la KIF, given the potential amount of data out there.
 >Since they are mappable to Datalog, SHOE's inferential semantics
 >are at least polynomial if not better, and we believe that they
 >are limited enough to deploy on a large scale, especially in
 >domains where ontologies are relatively disjoint from one another.

I personally agree 100% with you.
Full predicate logic languages (like KIF) are not tractable. The
solution is to use a subset of FOL. A reasonable subset is
Horn logic, with special case Datalog. However, well founded
negation is also polynomial and adds a lot to expressiveness.


 >Still, I can think of some further restrictions can be applied
 >that would enforce an even more efficient inferential approach.
 >For example, a schema X might declare itself to belong one of
 >three levels: one that does not provide inferences, one that
 >disallows any inferences from outside schema including symbols
 >declared in X, and one that permits full inferential capabilities.
 >Revisions of schema (later versions) are permitted only to
 >increase the semantic level or keep it the same, but not decrease
 >it. Even more semantic expressivity might be added in this way
 >(another level which permits stratified negation, for example).

I generally agree with your approach. However, stratified negation
is not usable, because almost everything boils down to ONE
predicate symbol (e.g. triple),  thus almost every program is
non-stratified.

 >Also certain relations or simple (one-level, non-recursive)
 >inferences might be declared Final, to indicate to agents that
 >they should be simply flattened when gathered rather than inferred
 >over and over.  Certainly most if not all of RDF's current
 >special-case inference mechanisms can be declared Final without a
 >significant decrease in speed of data-gathering.

subclassOf is transitive -> non-final.


 >RDF certainly has an odd data model.  It borrows from frame
 >languages in that it has a set of literals, a set of resources, a
 >set of properties, and a set of statements of the form <prop,
 >domain, range>, where prop is a property, range is a literal or a
 >resource, and oddly, domain is *only* a resource.  Why is having
 >this restriction strange?
This point is also not clear to me, and there were other people
also asking, why a literal can not be a subject. My guess is right
now, that one would like to have only statements about
identifiable resources.

 >Obviously this form has had a long and
 >distinguished history in semantic networks and frame languages.
 >But these languages have also typically featured default logics,
 >IDO, or other non-first-order logic features which make it nice to
 >specify one of the two arguments to a binary relation to be a
 >"slot" that is fillable only by things in an inheritance chain.
 >But RDF doesn't have any of these features; in fact, it is
 >directly mappable to a very simple tuple calculus.  As such, the
 >only justification I can think of for RDF having this restriction
 >is syntax.
Not so simple because of reification.


 >RDF's syntax basically boils down to this:
 >
 >    [
 >    Object Description Area:
 >        {
 >        [Statements with the Object filling the _domain_ position]*
 >        }
 >    ]*

 >It appears that the only way that the relation Q(x,y) can be
 >declared in RDF is _physically_ within x's description. y cannot
 >declare the relation. No one else can either.  Only x can.  Other
 >than making the "abbreviated syntax" look pretty, I am at a loss
 >as to why this is so.  It seems arbitrary and unneccessary.
 >
I don't get you here. RDF just allows Object-Attribute-Value
Triples. In a concrete RDF description one can define anything.
Also that Bill Clinton is married to Madonna.

So what is the problem?


 >Inverted Relations
 >
 >
 >    Q(x,y) ^ inverseOf(R,Q) --> R(y,x).
 >    inverseOf(R,Q) ^ subPropertyOf(Q,S) --> inverseOf(R,S).
 >
 >Of course, to hook the two, you'd need two additional inferences:
 >
 >    subPropertyOf(R,Q) ^ inverseOf(Q,S) --> inverseOf(R,S).
 >    inverseOf(R,Q) ^ inverseOf(Q,S) --> subPropertyOf(R,S).
 >        (I think that's all of 'em...)
 >
 >I wonder why RDF doesn't have this?  This is a very useful thing,
 >with no additional computational complexity.  It's free.  It's
 >obviously a useful tool for merging divergent schema.  The only
 >reason I can think of that RDF does not provide this is that it
 >would allow, through inference, for Madonna to actually claim
 >someone as her husband (through an inverted "wife of"
 >relationship).  And of course, we wouldn't want to do that! :-)
 >:-)
I agree. Inverse relations are a useful thing are occur very often
and it should be possible to define them in RDFS.

 >SHOE "Claims"
 >-------------
 >
 >In SHOE we also have a similar syntax model:
 >
 >        [
 >        Object Description Area:
 >                {
 >                [Statements that the Object is making]*
 >                }
 >        ]*
 >
 >Note the important difference here: Objects make claims.  These
 >claims can be anything.  I can claim that Madonna and George
 >Clinton are married. That's fine.  But because of the syntax when
 >parsing it, agents clearly understand that *I*, not Madonna or
 >George Clinton, is making this claim. Which allows agents to take
 >what I say with a grain of salt.
 >SHOE thus views the relation Q(X,Y) actually as a 3-ary relation
 >_Q(C,X,Y), where C is the _claimant_.  This is read as: "C says
 >that Q(X,Y)".  In RDF, C must *always* be X, which seems vestigial
 >at best. RDF does not even permit C to be Y, much less anything
 >else.  In SHOE, C, X, and Y are independent.

This point was extensively discussed a few weeks ago on this list.
RDF has here a even more flexible solutions: reification allows to
make expressions about expressions, so one can say:

[[A -property->B]-claimedBy->C]

This covers also believes and other modal operators. However, a
database stores indeed a 3-ary relation. And this is NOT second
order logic, although it is believed by some people.

 >N-ary Relations
 >---------------
 >
 >The biggest consequence of RDF's frame model, and one I think
 >really needs to be addressed, is its inability to handle N-ary
 >relations. Wow! Of course, in theory all things expressable in
 >n-ary relations can be mapped to binary relations.  But that's a
 >little like saying that all languages are Turing-Complete as a
 >justification for continuing to use COBOL.  :-) Expressing
 >n-ary-as-binary relations in RDF, or defining them in its schema,
 >isn't fun.

 >SHOE started out as a binary relation model very much like RDF.
 >But it was after one early interested party (the CIA -- hey, sue
 >us, we're in D.C. :-) complained about the model that we decided
 >to move SHOE from a frame model to a n-ary relational calculus.
 >The CIA wanted to use SHOE but to create relations that said not
 >only that P(x,y), but that Agent 007 said that he believed
 >P(x,y,m), where m was a certainty factor.  The CIA also wanted to
 >be able to say things like Agent 007 meets with Agent 009 on
 >Tuesday in Prague.  It seemed an obvious thing to say;
 >unfortunately creating an intermediate object (which SHOE did,
 >just like RDF does now) was a really ugly approach, especially
 >since it meant that this relation was *different* from the binary
 >relations we used (which didn't need an intermediate object).


I still think that one just needs a syntactic layer above RDF and
one is able to handle these things fine...

 >Another one: One of the odd consequences of binary relations in
 >RDF, plus its lack of certain basic data types (like integers)

These are only lacking temporarily, until someone at W3C finally
decides how to merge XML schema and RDF schema. (Hey guys, you
promised that a long time ago....)

 >, is
 >the need for "special" collection classes, with custom numbered
 >relational values. With an n-ary approach this special case
 >magically goes away.

Not really. And having relations with 1000 arguments is not really
fun.... There are whole databases just built on the concepts of
binary relations, and they claim that there is almost NO
performance penalty (MONET at Amsterdam, if somebody is interested
i can dig out a better reference).

 > In RDF you attach elements to containers
 >with a custom infinite (!) of relations, so in RDF you'd attach an
 >element X as the first item with the relation rdf:_1(container,X).

No, not a relation. Just a counter, witch is a relation argument
in a general relation triple. This is even more flexible.

 >In SHOE you just make some relation, say, "contains", and write
 >contains(container,X,1).  No more need for infinite relational
 >sets.  Bag, etc. just go away.  And why not?  After all, since
 >infinite relational sets aren't exactly easy to implement as
 >tables :-), an RDF agent is probably going to implement this stuff
 >internally as contains(container,X,1) anyway!

It handled in RDF in exactly the same way....

 >RDF has made some n-ary stabs.  In Section 7.3 for example, the
 >RDF Model and Syntax Specification made some suggestions about how
 >to get around this deficiency.  Nonetheless, non-binary relations
 >are guaranteed to be second-class citizens in the RDF semantics.
 >While binary relations are first-class resources in RDF,
 >"pseudo-n-ary" relations are odd structures which cannot be
 >referenced by a resource.

They can by reification.

 >While binary relations can take
 >advantage of subPropertyOf, pseudo-n-ary relations cannot.  And
 >reifying a binary relation is trivial. Add a single additional
 >argument, and reification becomes a hairy mess. Lastly, mapping
 >binary relations (using subProperty or, perhaps in the futre,
 >inverseOf) from schema to schema is feasible.  Mapping non-binary
 >relations is presently well nigh impossible.
 >
 >There are syntactic inconsistencies as well.  Binary relations are
 >expressible directly in the Basic Abbreviated Syntax.

There are to things to distinguish: 1) RDF Datamodel 2) RDF
Serialization Syntax.

The RDF-syntax is a mess, also the abbreviated syntax (see eg. the
request from Jos De Roo and my answer). A simplified version is on
its way.


 >To sum up: RDF's frame-based binary relation model, with the
 >domain position hard-set by syntax to be inside a resource
 >description, does not provide any special benefit IMHO.  Certainly
 >it does not take advantage of non-first-order inheritance or other
 >features.

What do you mean by first order inheritance?

 >There also does not appear to be any computational
 >complexity benefit.  And it does seem to have an awful lot of
 >downsides in transparency, consistency, difficulty in
 >manipulation, and arbitrary warts like a lack of inverse
 >relations. Lastly, the present syntax which enforces this model is
 >complicated for the common man to wrap his brain around.
 >Revisiting it with a critical eye would do it some serious good.

I completely agree with the syntax aspect. Its a mess that there
are several ways to express things and it to make non-obvious errors
like the one mentioned in Jos mail. Still, i think RDF has the
potential of becoming the first widely used Knowledge
Representation Language, with the chance of creating applications
we can not even imagine.
That's why we should try to work on ONE
accepted standard, not on several ones. This would prevent things
from happening (the VHS vs. BETAMAX syndrom...).


CU,

Stefan
Received on Thursday, 23 December 1999 20:24:03 UTC