- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sat, 8 Sep 2001 14:08:23 -0400
- To: Sandro Hawke <sandro@w3.org>
- Cc: www-rdf-rules@w3.org
On Fri, Sep 07, 2001 at 06:22:14PM -0400, Sandro Hawke wrote: > > I agree that RDF queries and RDF rule premises are basically the same > things. So what is an RDF query? > > At a very abstract level, I think the RDF query API is something like: > > match(dataset, pattern) -> set of solutions > > This vaguely matches every RDF query system I've heard of. The > dataset is a set of RDF statements (triples), and the pattern is a set > of RDF statements (triples) which may have existential variable > elements. A solution is either (1) a mapping from the variables to > constants or (2) a set of triples which match the pattern (that is, > with the variable subsitution done), or (3) both. I think this is > equivalent to a relational join. > > There is a shift in complexity if we go with the interpretation of RDF > "anonymous nodes" as existential variables. That simplifies things by > saying the pattern is just an RDF graph like any other, but it > complicates things by allowing the dataset to have variables too. > This seems to be equivalent to trying to perform unification [1] > between the two sets as conjunctions of their triples, with the > complication that the elements have no intrinsic ordering. (Does that > turn this into a much harder problem, or is there a trick to making it > not matter?) > > This makes the match seem more symmetric, but it's still being able to > match all the triples in the second argument which constitutes > "success". Ambiguities will arise if anonymous nodes do double duty as variables and unlabled addresses in a graph (or structure if you like to think in C terms). I think the options come down as follows: - anonymous nodes as variables in dataset and pattern. - anonymous nodes as variables only in pattern. - use something else for variables. - anonymous nodes as variables in dataset and pattern: Modeling a query as a series of statements with anonymous nodes for the variables is tempting as we don't have to invent any new node types but I beleive it opposes the M&S [3] which gives the example: "http://www.w3.org/Home/Lassila has creator something and something has name Ora Lassila and email lassila@w3.org" Using this interpretation of anonymous nodes eliminates our ability to create exactly one node in a graph without naming it. For instance, the following: <r:Description about="http://...bus_218"> <b:scheduledStop> <r:Description> <b:city>Boston</b:city> <b:time>14:59EST</b:time> <b:terminal>Z</b:terminal> </r:Description> </b:scheduledStop> </r:Description> would not say "bus 218 has a stop in boston at 14:59" but instead "I'm am talking about all of 218's stops in Boston at 14:59." This statement would not be useful to a trip planner that didn't have an external assertion of the exsistence of this scheduled stop. It also doesn't say anything about the node set you've selected with the set of assertions containing a variable. We'll need something outside of (or above) the model to deliniate the selection from the assertions. Another problem is that there is no way in RDF/XML to assert multiple statements with a common anonymous node as the object. This limits the realm of expressible queries. For instance, this algae query that looks for members of groups that I trust would be inexpressible: (ask '((http://...memberOf ?id ?group) (http://...trusts http://...me ?group)) collect '(?id ?group)) - anonymous nodes as variables only in pattern: This seems to mostly work - I can't think of a reason to assert the existence of an anonymous node in a query. The down side is that you can't make assertions about variables used in a query. If the same terms show up in the dataset, they identify something different. This solution also has the cost that queries must be rigorously sequestered from the dataset or the query will assert the very statements you are querying. This would be true of statements in the query that don't have any variables at all (I don't know that these would exist, though). This also suffers the RDF/XML anonymous node expressibility problem described above. - use something else for variables. None of the query engines I have played with encode queries in RDF. This frees them up to use whatever they want to encoded variabls. The problem is, naturally, there is little interoperability. The limits not only the ability to use the same query in different environments, but also the ability to make formal assertions about queries and rules. One could reify the statements in a query and define a new node type for variables. Following is an example of coding the above algae query as a series of s:Constraints which are subtypes of r:Statement. It is only slightly more verbose... <r:Description> <q:hasTerm> <q:Constraint ID="1"> <s:Predicate r:resource="http://...memberOf" /> <s:Subject> <q:Variable r:ID="?id" /> </s:Subject> <s:Object> <q:Variable r:ID="?group" /> </s:Object> </q:Constraint> <q:hasTerm> <q:Constraint ID="1"> <s:Predicate r:resource="http://...trusts" /> <s:Subject r:resource="http://...me"> <s:Object> <q:Variable r:ID="?group" /> </s:Object> </q:Constraint> </q:hasTerm> </r:Description> The cool thing about this model is that it never asserts ?id -----------http://...memberOf-> ?group http://...me --http://...trusts---> ?group so it's safe to encounter in the dataset. This also means that one could make statements about the query which would probably be crucial in a lot of trust systems. Just an idea, have at. > [1] http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?unification > [2] http://www.daml.org/tools/wishlist.html#diff [3] http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ -- -eric (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Saturday, 8 September 2001 14:08:24 UTC