Re: What is an RDF Query? from Peter F. Patel-Schneider on 2001-09-17 (www-rdf-rules@w3.org from September 2001)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Mon, 17 Sep 2001 13:00:37 -0400
To: sandro@w3.org
Cc: www-rdf-rules@w3.org
Message-Id: <20010917130037J.pfps@research.bell-labs.com>
From: Sandro Hawke <sandro@w3.org>
Subject: Re: What is an RDF Query? 
Date: Mon, 17 Sep 2001 10:56:52 -0400

> [...]

> I think we ought to seriously explore the possibility of using an RDF
> graph not only as the assertion language, but also as the query, rule-
> premise, and rule-conclusion languages.  I'm sure it wont be perfect,
> but this seems like an important idea to consider and a strawman
> against which to compare alternatives.   From here, I'd be interested
> to hear what you (Peter) think is still needed.

Yes, sure you can do this, and, in particular, you provide two ways of
interpreting such rules built in this way.  

> Since I have trouble thinking in RDF graphs sometimes, I translate
> them to a small subset of FOL for this kind of work.  Specifically, I
> use the subset with only constants, existential variables, 2-ary
> predicates, and conjunction.  (No universal variables, no negation, no
> disjunction, no equality, and no logic functions.)

Again, this sort of thing is possible, and provides good insights into the
meaning of RDF(S).  However RDFS is not captured by this subset of FOL, as
RDFS requires that rdfs:subPropertyOf be transitive.  To make this
intuition precise, you have to capture all of whatever you are considering,
as even very small changes in the formalism have large consequence.

> Using this language for assertions is obvious.  

Well, fairly obvious, see above.

> Using it for queries
> nearly obvious: you ask for a proof of the formula.  

I think that you don't want proofs, instead you want something more akin to
bindings. 

> Using it for a
> rule-premise or rule-conclusion gets a little more tricky, because we
> want to sort-of join the scopes of the existential variables in the
> two parts.  

> I think there are two fairly natural ways to do this: one
> seems to give us datalog rules and the other seems to give us Horn
> rules.

You have to be quite careful about what you are stating here, as there are
several versions of Horn rules, including:

[Binary] Horn 1: Quantifier-free formula in conjunctive normal form with
     each conjunction having at most one positive literal, [and all
     predicates binary] , with variables read universally.  This
     is not what you want, as its assertional component is strictly more
     powerful than RDFS.

[Binary] Horn 2: Conjunction of positive ground formula and Horn rules,
     [with all predicates binary].  A Horn rule is a
     quantifier-free conjunction of literals, with exactly one
     positive conjunct and at least one negative conjunct, with the
     variables being read universally.  This is closer, as the assersional
     component is closer to RDF, but you seem to want to have existentials
     in your non-rules.

[Binary] Horn 3: Conjunction of assertions and Horn rules, [with all
     predicates binary].  The assertions are positive literals,
     with free variables read existentially.  The Horn rules are as in 2.
     This is even closer, but not exactly what you seem to want.

> Style 1:  
> 
>   Simply combine the scope of variables in the premise and conclusion.
>   So if we have two RDF graphs (in FOL notation [2]):
>
>     (1) EXISTS x In(Spot, x) AND In(x, TheYard) AND Material(x, Wood)
>     (2) EXISTS x Isa(x, DogHouse)
>   
>   we can use these in a rule by turning the existential variables into
>   universal variables in a scope outside of both formulas:
> 
>     FORALL x
>        In(Spot, x) AND In(x, TheYard) AND Material(x, Wood)
>        => 
>        Isa(x, DogHouse)
> 
>   I think this is as expressive as datalog.  There is a question as
>   to what it means if a variable occurs only in the second clause.  In
>   this style, I think we simply disallow that case.

So rule conclusions are not arbitrary RDF, in other words, so you have lost
your primary goal.

> Style 2:
> 
>    We say that any variables in the second formula which are not also
>    in the first formula remain as local existential variables.
> 
>      (1) EXISTS g1,g3 Grandparent(g1,g3)
>      (2) EXISTS g1,g2,g3 Parent(g1,g2) AND Parent(g2,g3)
> 
>    would become a rule like this:
> 
>      FORALL g1,g3
>         Grandparent(g1,g3)
> 	=>
>         EXISTS g2 Parent(g1,g2) AND Parent(g2,g3)
> 
>       (This rule basically says that if two things have a grandparent
>       relationship, there is a third thing and they have a transitive
>       parent relationship through that third thing.)
> 
>    This may well not be full [binary] Horn logic, but I believe it's more
>    expressive than Style 1, since I can't figure out how to express
>    this example in Style 1.  

[Note that although binary FOL can emulate FOL, the relationship between
binary first-order Horn and full first-order Horn may be different.  I
can't remember the relationship off the top of my head.]

I also believe Style 2 is strictly more powerful than Style 1.    However,
the situation is not obvious, at least for RDFS, as RDFS may not be Horn.

> I also can't think of any [binary] Horn examples
>    that I can't express in Style 2.  (While Style 2 doesn't have the
>    logic functions which normally separate datalog and Horn logic, the
>    nested existential corresponds to Skolem functions, and in my
>    playing around that seems to be enough.  I don't have the skills
>    to approach this formally.)
> (I have implemented Style 2, and I've talked about this before [1] but
> that discussion went in a somewhat different direction.)
> 
>     -- sandro
> 
> [1] http://lists.w3.org/Archives/Public/www-rdf-logic/2001Mar/0075.html
> [2] I used an FOL ascii syntax found on this simple introduction:
>     http://www.sdsc.edu/~tbailey/teaching/cse151/lectures/chap07a.html


The above is a reasonable approach to rules in RDFS, but there are lots of
issues that have to be addressed. 

Let me make a stab at it, while still staying close to what you seem to
want.

1/ An RDF assertional knowledge base is a collection of binary atomic
   formulae, with existential formulae read existentially.  That is, an RDF
   knowledge base is 

	E x1, ..., xn   B1(t11,t12) ^ ... ^ Bm(tm1,tm2)

   for n>=0, m>0, where tij is one of the xi or some constant.

   [Note that this ignores lots of the stuff in RDF (including reification
   and containers).]

2/ An RDFS knowledge base is an RDF knowledge base plus the following
   formulae

	.... [ lots of stuff here, some Horn, some maybe not Horn, maybe even
	some not first-order ] ...

3/ An RDF rule is a formula of the form

	A x1, ..., xn E y1, ..., yo  P1(t11,t12) ^ ... ^ Pm(tm1,tm2) -> 
		       Pm+1(tm+11,tm+12) ^ ... ^ Pl(tl1,tl2)

   for n>=0, m>0, l>m, where tij is one of the xi or one of the yi or some
   constant and each xi that appears as one of the tjk for j>m must also
   appear as at least one of the tjk for j<=m. 

4/ An RDF(S) rule knowledge base is an RDF(S) assertional knowledge base
   plus a collection of RDF rules.

OK, now we have the syntax.  Note that neither rule premises nor
conclustions are precisely the same as assertions.  

However, we do not yet have semantics.  Fortunately we can just appeal to
FOL semantics.

A model of an RDF(S) rule knowledge base, KB, is precisely an FOL model of
KB.

Finally, what do we want to do with this?  Well I don't really know, but
there are several things that can be done.

1/ Is there a model of an RDF(S) rule knowledge base.  Note that this is
not a trivial question!  (Suppose that the rule conclusion introduces a
subclass loop.)

2/ Does an RDF(S) rule knowledge base entail (in the standard sense) an
RDF(S) assertional knowledge base.  

3/ Given an RDF(S) rule knowledge base, R, and a conjunction of binary
   atomic formulae
	Q = B1(t11,t12) ^ ... ^ Bm(tm1,tm2)
	where tij is either a variable or a constant,
   determine all the mappings M from the free variables of Q to constants
   in R or Q such that R entails Q/M (i.e., Q with its free variables
   mapped by M).  Note that there are several other possibilities for this
   retrieval operation.


Hopefully you see some of the complications in defining a rule system on
top of RDF(S).  If you make rules be first-order implications then some of
the issues that could be ignored in RDF(S) can no longer be ignored.  If
you make rules some other sort of beast, then you can no longer use
first-order logic as an intended meaning, and you have to make sure that
your intuitions are still valid.

Peter F. Patel-Schneider
Bell Labs Research
Received on Monday, 17 September 2001 13:01:38 UTC