Re: What is an RDF Query?

On Fri, Sep 07, 2001 at 06:22:14PM -0400, Sandro Hawke wrote:
> 
> I agree that RDF queries and RDF rule premises are basically the same
> things.  So what is an RDF query?
> 
> At a very abstract level, I think the RDF query API is something like:
> 
>     match(dataset, pattern) -> set of solutions
> 
> This vaguely matches every RDF query system I've heard of.  The
> dataset is a set of RDF statements (triples), and the pattern is a set
> of RDF statements (triples) which may have existential variable
> elements.  A solution is either (1) a mapping from the variables to
> constants or (2) a set of triples which match the pattern (that is,
> with the variable subsitution done), or (3) both.  I think this is
> equivalent to a relational join.
> 
> There is a shift in complexity if we go with the interpretation of RDF
> "anonymous nodes" as existential variables.  That simplifies things by
> saying the pattern is just an RDF graph like any other, but it
> complicates things by allowing the dataset to have variables too.
> This seems to be equivalent to trying to perform unification [1]
> between the two sets as conjunctions of their triples, with the
> complication that the elements have no intrinsic ordering.  (Does that
> turn this into a much harder problem, or is there a trick to making it
> not matter?)
> 
> This makes the match seem more symmetric, but it's still being able to
> match all the triples in the second argument which constitutes
> "success".   

Ambiguities will arise if anonymous nodes do double duty as variables
and unlabled addresses in a graph (or structure if you like to think
in C terms). I think the options come down as follows:
- anonymous nodes as variables in dataset and pattern.
- anonymous nodes as variables only in pattern.
- use something else for variables.


- anonymous nodes as variables in dataset and pattern:

Modeling a query as a series of statements with anonymous nodes for
the variables is tempting as we don't have to invent any new node
types but I beleive it opposes the M&S [3] which gives the example:

  "http://www.w3.org/Home/Lassila has creator something and something
   has name Ora Lassila and email lassila@w3.org"

Using this interpretation of anonymous nodes eliminates our ability to
create exactly one node in a graph without naming it. For instance,
the following:

<r:Description about="http://...bus_218">
   <b:scheduledStop>
      <r:Description>
         <b:city>Boston</b:city>
         <b:time>14:59EST</b:time>
         <b:terminal>Z</b:terminal>
      </r:Description>
   </b:scheduledStop>
</r:Description>

would not say "bus 218 has a stop in boston at 14:59" but instead
"I'm am talking about all of 218's stops in Boston at 14:59." This
statement would not be useful to a trip planner that didn't have
an external assertion of the exsistence of this scheduled stop.

It also doesn't say anything about the node set you've selected with
the set of assertions containing a variable. We'll need something
outside of (or above) the model to deliniate the selection from the
assertions.

Another problem is that there is no way in RDF/XML to assert multiple
statements with a common anonymous node as the object. This limits the
realm of expressible queries. For instance, this algae query that
looks for members of groups that I trust would be inexpressible:

(ask '((http://...memberOf ?id          ?group)
       (http://...trusts   http://...me ?group))
 collect '(?id ?group))


- anonymous nodes as variables only in pattern:

This seems to mostly work - I can't think of a reason to assert the
existence of an anonymous node in a query.

The down side is that you can't make assertions about variables used
in a query. If the same terms show up in the dataset, they identify
something different. This solution also has the cost that queries must
be rigorously sequestered from the dataset or the query will assert
the very statements you are querying. This would be true of statements
in the query that don't have any variables at all (I don't know that
these would exist, though).

This also suffers the RDF/XML anonymous node expressibility problem
described above.


- use something else for variables.

None of the query engines I have played with encode queries in
RDF. This frees them up to use whatever they want to encoded
variabls. The problem is, naturally, there is little
interoperability. The limits not only the ability to use the same
query in different environments, but also the ability to make formal
assertions about queries and rules.

One could reify the statements in a query and define a new node type
for variables. Following is an example of coding the above algae query
as a series of s:Constraints which are subtypes of r:Statement. It is
only slightly more verbose...

<r:Description>
   <q:hasTerm>
      <q:Constraint ID="1">
         <s:Predicate r:resource="http://...memberOf" />
         <s:Subject>
            <q:Variable r:ID="?id" />
         </s:Subject>
         <s:Object>
            <q:Variable r:ID="?group" />
         </s:Object>
      </q:Constraint>
   <q:hasTerm>
      <q:Constraint ID="1">
         <s:Predicate r:resource="http://...trusts" />
         <s:Subject r:resource="http://...me">
         <s:Object>
            <q:Variable r:ID="?group" />
         </s:Object>
      </q:Constraint>
   </q:hasTerm>
</r:Description>

The cool thing about this model is that it never asserts
  ?id -----------http://...memberOf-> ?group
  http://...me --http://...trusts---> ?group
so it's safe to encounter in the dataset. This also means that one could
make statements about the query which would probably be crucial in a lot
of trust systems. Just an idea, have at.

> [1] http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?unification
> [2] http://www.daml.org/tools/wishlist.html#diff
[3] http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
-- 
-eric

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Saturday, 8 September 2001 14:08:24 UTC