Re: SPARQL semantics: open issues for basic query patterns

I'd like to add some comments to Enrico's proposal to explain the  
differences and similarity with Pat proposal in <http://lists.w3.org/ 
Archives/Public/public-rdf-dawg/2005OctDec/0340>.

For the sake of clarity I wont discuss the "told bnodes" issue;  
anyway, as Pat pointed out it can be added "for free" (it may impact  
syntax and protocol, though).

Pattern Solutions
-----------------
Pat's and our definitions disagree on the restriction to be imposed  
to the domain of pattern solutions. In Pat's definition the range is  
restricted to terms appearing in the dataset (active domain), in our  
definition this restriction is in place for bnode names only.

For simple entailment the two definitions coincide; the difference  
shows up with RDF(S) entailment. It boils down to whether axiomatic  
and "implicit" triples involving RDF(S) vocabulary show up in  
variable bindings (see <http://lists.w3.org/Archives/Public/public- 
rdf-dawg/2005OctDec/0388> thread for more info). I'd like to add  
that, if the more restrictive approach is going to be chosen, systems  
that automatically perform an RDF(S) completion of the graphs should  
make extra checks on the results before returning answers to be sure  
that they're not returning bindings with RDF(S) vocabulary terms not  
present in the original graph.

Basic Graph Matching
--------------------

Pat proposal boils down to two candidate approaches:

2.5 Basic Graph Matching.

A basic graph pattern P matches on graph G with solution S when S is  
a pattern solution on P to G such that

	a) G (simply) entails S(P)

	b) G (simply) entails (G union S(M(P)))
            where M is a 1:1 mapping on blank nodes which replaces  
each bnode in P with a new blank node which does not occur in P or G

In the definition I removed any reference to told (or persistent)  
bnodes. As already pointed out in the mailing list, the option a) has  
some counterintuitive implications that, in my opinion, make it  
unsuitable for a proper formalisation of SPARQL. Consider the  
following example:

Data:
  _:a :p _:b
  _:b :url <http://example.org>

Query:
  SELECT * { ?x ?p ?y }

Answer set with option a):
  ?x/_:a ?p/:p ?y/_:b
  ?x/_:b ?p/:p ?y/_:a
  ?x/_:b ?p/:url ?y/<http://example.org>
  ?x/_:a ?p/:url ?y/<http://example.org>
  ?x/_:b ?p/:url ?y/_:a
  ?x/_:a ?p/:url ?y/_:b

The problem is that, even if the bnode names are taken from the  
graph, their names in the left and right hand sides of the entailment  
don't matter.

Option b) doesn't have this problem because G is considered together  
with the BGP on the right hand side. Roughly speaking, in option b)  
reusing a bnode in the "wrong" place would create an additional co- 
reference not present in the original graph.

I don't think we have really two options, since I don't see why  
anybody would consider option a) behaviour the "right" one.

Note that the fact that this has nothing to do with the told bnodes  
issue, which is somewhat orthogonal. Aside, with the b) approach told  
bnodes come for free.

Now, our definition is slightly different (see Enrico's <http:// 
lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0000>), being  
(in our opinion) the simpler:

	G (simply) entails S(G RDFmerge P)

Apparently, it's different from Pat's

	G (simply) entails (G union S(M(P)))

but in fact they're the same. To see it, consider that (G union S(M 
(P))) is (syntactically) equal to S(G union M(P)) because G doesn't  
contain any variable. Moreover, (G union M(P)) is equivalent (up to  
bnodes renaming) to (G RDFmerge P) because of the definition of RDF  
merge.

If we're not interested in bnode persistency, then the RDF merge as  
defined in [RDF-MT] is enough (there's a minor detail in the fact  
that P contains variable nodes). Our definition of RDF merge takes  
into account that bnodes in the dataset shouldn't be changed.

I hope that this email clarifies our position on the BGP matching issue.

Cheers,
--sergio

Received on Monday, 2 January 2006 15:34:34 UTC