- From: Sergio Tessaris <tessaris@inf.unibz.it>
- Date: Mon, 2 Jan 2006 16:34:11 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
I'd like to add some comments to Enrico's proposal to explain the
differences and similarity with Pat proposal in <http://lists.w3.org/
Archives/Public/public-rdf-dawg/2005OctDec/0340>.
For the sake of clarity I wont discuss the "told bnodes" issue;
anyway, as Pat pointed out it can be added "for free" (it may impact
syntax and protocol, though).
Pattern Solutions
-----------------
Pat's and our definitions disagree on the restriction to be imposed
to the domain of pattern solutions. In Pat's definition the range is
restricted to terms appearing in the dataset (active domain), in our
definition this restriction is in place for bnode names only.
For simple entailment the two definitions coincide; the difference
shows up with RDF(S) entailment. It boils down to whether axiomatic
and "implicit" triples involving RDF(S) vocabulary show up in
variable bindings (see <http://lists.w3.org/Archives/Public/public-
rdf-dawg/2005OctDec/0388> thread for more info). I'd like to add
that, if the more restrictive approach is going to be chosen, systems
that automatically perform an RDF(S) completion of the graphs should
make extra checks on the results before returning answers to be sure
that they're not returning bindings with RDF(S) vocabulary terms not
present in the original graph.
Basic Graph Matching
--------------------
Pat proposal boils down to two candidate approaches:
2.5 Basic Graph Matching.
A basic graph pattern P matches on graph G with solution S when S is
a pattern solution on P to G such that
a) G (simply) entails S(P)
b) G (simply) entails (G union S(M(P)))
where M is a 1:1 mapping on blank nodes which replaces
each bnode in P with a new blank node which does not occur in P or G
In the definition I removed any reference to told (or persistent)
bnodes. As already pointed out in the mailing list, the option a) has
some counterintuitive implications that, in my opinion, make it
unsuitable for a proper formalisation of SPARQL. Consider the
following example:
Data:
_:a :p _:b
_:b :url <http://example.org>
Query:
SELECT * { ?x ?p ?y }
Answer set with option a):
?x/_:a ?p/:p ?y/_:b
?x/_:b ?p/:p ?y/_:a
?x/_:b ?p/:url ?y/<http://example.org>
?x/_:a ?p/:url ?y/<http://example.org>
?x/_:b ?p/:url ?y/_:a
?x/_:a ?p/:url ?y/_:b
The problem is that, even if the bnode names are taken from the
graph, their names in the left and right hand sides of the entailment
don't matter.
Option b) doesn't have this problem because G is considered together
with the BGP on the right hand side. Roughly speaking, in option b)
reusing a bnode in the "wrong" place would create an additional co-
reference not present in the original graph.
I don't think we have really two options, since I don't see why
anybody would consider option a) behaviour the "right" one.
Note that the fact that this has nothing to do with the told bnodes
issue, which is somewhat orthogonal. Aside, with the b) approach told
bnodes come for free.
Now, our definition is slightly different (see Enrico's <http://
lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0000>), being
(in our opinion) the simpler:
G (simply) entails S(G RDFmerge P)
Apparently, it's different from Pat's
G (simply) entails (G union S(M(P)))
but in fact they're the same. To see it, consider that (G union S(M
(P))) is (syntactically) equal to S(G union M(P)) because G doesn't
contain any variable. Moreover, (G union M(P)) is equivalent (up to
bnodes renaming) to (G RDFmerge P) because of the definition of RDF
merge.
If we're not interested in bnode persistency, then the RDF merge as
defined in [RDF-MT] is enough (there's a minor detail in the fact
that P contains variable nodes). Our definition of RDF merge takes
into account that bnodes in the dataset shouldn't be changed.
I hope that this email clarifies our position on the BGP matching issue.
Cheers,
--sergio
Received on Monday, 2 January 2006 15:34:34 UTC