Re: SPARQL semantics: open issues for basic query patterns from Pat Hayes on 2006-01-02 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 2 Jan 2006 14:25:31 -0600
To: Sergio Tessaris <tessaris@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090abfdf31c08631@[10.100.0.9]>
>I'd like to add some comments to Enrico's proposal to explain the 
>differences and similarity with Pat proposal in 
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0340>.
>
>For the sake of clarity I wont discuss the "told bnodes" issue; 
>anyway, as Pat pointed out it can be added "for free" (it may impact 
>syntax and protocol, though).

Agreed. Lets ignore told bnodes, which makes the definitions slightly simpler.

>
>Pattern Solutions
>-----------------
>Pat's and our definitions disagree on the restriction to be imposed 
>to the domain of pattern solutions. In Pat's definition the range is 
>restricted to terms appearing in the dataset (active domain), in our 
>definition this restriction is in place for bnode names only.
>
>For simple entailment the two definitions coincide; the difference 
>shows up with RDF(S) entailment. It boils down to whether axiomatic 
>and "implicit" triples involving RDF(S) vocabulary show up in 
>variable bindings (see 
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0388> 
>thread for more info). I'd like to add that, if the more restrictive 
>approach is going to be chosen, systems that automatically perform 
>an RDF(S) completion of the graphs should make extra checks on the 
>results before returning answers to be sure that they're not 
>returning bindings with RDF(S) vocabulary terms not present in the 
>original graph.

Well, "extra" here really does depend on how this is implemented. If 
the server actually generates an RDF closure and then matches against 
it, then in order to achieve an exact correspondence to RDF 
entailment, it would have to apply a filter, true. But any sensible 
server, IMO, would implement things its favorite way and then 
advertise itself as providing the answers that it in fact does 
provide, by specifying that its target graphs are closed in some 
appropriate way, eg they contain all RDF consequences. If this is, in 
tautological cases, a few more than would be sanctioned by the strict 
RDF-entailment specification, no harm is done. This behavior would 
not violate the specs as written, note.

>
>Basic Graph Matching
>--------------------
>
>Pat proposal boils down to two candidate approaches:
>
>2.5 Basic Graph Matching.
>
>A basic graph pattern P matches on graph G with solution S when S is 
>a pattern solution on P to G such that
>
>	a) G (simply) entails S(P)
>
>	b) G (simply) entails (G union S(M(P)))
>            where M is a 1:1 mapping on blank nodes which replaces 
>each bnode in P with a new blank node which does not occur in P or G
>
>In the definition I removed any reference to told (or persistent) 
>bnodes. As already pointed out in the mailing list, the option a) 
>has some counterintuitive implications that, in my opinion, make it 
>unsuitable for a proper formalisation of SPARQL.
>Consider the following example:
>
>Data:
>  _:a :p _:b
>  _:b :url <http://example.org>
>
>Query:
>  SELECT * { ?x ?p ?y }
>
>Answer set with option a):
>  ?x/_:a ?p/:p ?y/_:b
>  ?x/_:b ?p/:p ?y/_:a
>  ?x/_:b ?p/:url ?y/<http://example.org>
>  ?x/_:a ?p/:url ?y/<http://example.org>
>  ?x/_:b ?p/:url ?y/_:a
>  ?x/_:a ?p/:url ?y/_:b
>
>The problem is that, even if the bnode names are taken from the 
>graph, their names in the left and right hand sides of the 
>entailment don't matter.

Sigh. Yes, you are right. Simply specifying that the bnodes in the 
answer occur in the graph solves the 'infinitely many answers' 
problem, but it still allows a great deal of redundancy, probably an 
unacceptable amount, in the answer set.

An alternative way to eliminate this would be directly: to treat this 
as a necessary condition on answer bindings, but also to impose a(n 
optional) condition of nonredundancy on answer sets. The weakest form 
would be to allow servers to not provide redundant answers, rather 
than require them to eliminate all redundancies, which could get 
impossibly onerous for large graphs. (An answer binding B is 
redundant if B(Q) has an instance in the answer set.)

Or, we could follow the path you suggest below. I have some problems 
with that, however.

>Option b) doesn't have this problem because G is considered together 
>with the BGP on the right hand side. Roughly speaking, in option b) 
>reusing a bnode in the "wrong" place would create an additional 
>co-reference not present in the original graph.
>
>I don't think we have really two options, since I don't see why 
>anybody would consider option a) behaviour the "right" one.
>
>Note that the fact that this has nothing to do with the told bnodes 
>issue, which is somewhat orthogonal. Aside, with the b) approach 
>told bnodes come for free.
>
>Now, our definition is slightly different (see Enrico's 
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0000>), 
>being (in our opinion) the simpler:
>
>	G (simply) entails S(G RDFmerge P)
>
>Apparently, it's different from Pat's
>
>	G (simply) entails (G union S(M(P)))
>
>but in fact they're the same.

Agreed. The choice is therefore pedagogical, to find the simplest way 
to say it. Unfortunately, all ways of saying it require some standard 
terms to be re-defined, or re-used in new circumstances.  My way 
requires that the merging operation be 'taken apart' into the 
bnode-separation mapping M and the unioning of the results, since the 
variable binding has to be applied in the middle. In order that 
readers fully understand this, some exegesis would probably be 
necessary. Your way requires that we re-define all the graph 
operations on graph patterns, since RDF merge at present isn't 
defined for patterns (and your merge is order-sensitive, which the 
standard one isn't). Either way, readers have to think about this 
stuff a lot harder than I think it is wise to ask them to.

>To see it, consider that (G union S(M(P))) is (syntactically) equal 
>to S(G union M(P))

Well, it would be if that were well-defined, but unfortunately it 
isn't. I agree, all these are tiny pedantic points, but we need to 
get them right.

>because G doesn't contain any variable. Moreover, (G union M(P)) is 
>equivalent (up to bnodes renaming) to (G RDFmerge P) because of the 
>definition of RDF merge.
>
>If we're not interested in bnode persistency, then the RDF merge as 
>defined in [RDF-MT] is enough (there's a minor detail in the fact 
>that P contains variable nodes)

Minor, maybe ...

>. Our definition of RDF merge takes into account that bnodes in the 
>dataset shouldn't be changed.

... but that is not so minor. Again, we have to go back and alter a 
definition in an earlier spec, which I am very nervous about doing.

>
>I hope that this email clarifies our position on the BGP matching issue.

Yes, thanks, and you make some good points.

Pat

>
>Cheers,
>--sergio


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 2 January 2006 20:25:47 UTC