- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 2 Jan 2006 14:25:31 -0600
- To: Sergio Tessaris <tessaris@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>I'd like to add some comments to Enrico's proposal to explain the >differences and similarity with Pat proposal in ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0340>. > >For the sake of clarity I wont discuss the "told bnodes" issue; >anyway, as Pat pointed out it can be added "for free" (it may impact >syntax and protocol, though). Agreed. Lets ignore told bnodes, which makes the definitions slightly simpler. > >Pattern Solutions >----------------- >Pat's and our definitions disagree on the restriction to be imposed >to the domain of pattern solutions. In Pat's definition the range is >restricted to terms appearing in the dataset (active domain), in our >definition this restriction is in place for bnode names only. > >For simple entailment the two definitions coincide; the difference >shows up with RDF(S) entailment. It boils down to whether axiomatic >and "implicit" triples involving RDF(S) vocabulary show up in >variable bindings (see ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0388> >thread for more info). I'd like to add that, if the more restrictive >approach is going to be chosen, systems that automatically perform >an RDF(S) completion of the graphs should make extra checks on the >results before returning answers to be sure that they're not >returning bindings with RDF(S) vocabulary terms not present in the >original graph. Well, "extra" here really does depend on how this is implemented. If the server actually generates an RDF closure and then matches against it, then in order to achieve an exact correspondence to RDF entailment, it would have to apply a filter, true. But any sensible server, IMO, would implement things its favorite way and then advertise itself as providing the answers that it in fact does provide, by specifying that its target graphs are closed in some appropriate way, eg they contain all RDF consequences. If this is, in tautological cases, a few more than would be sanctioned by the strict RDF-entailment specification, no harm is done. This behavior would not violate the specs as written, note. > >Basic Graph Matching >-------------------- > >Pat proposal boils down to two candidate approaches: > >2.5 Basic Graph Matching. > >A basic graph pattern P matches on graph G with solution S when S is >a pattern solution on P to G such that > > a) G (simply) entails S(P) > > b) G (simply) entails (G union S(M(P))) > where M is a 1:1 mapping on blank nodes which replaces >each bnode in P with a new blank node which does not occur in P or G > >In the definition I removed any reference to told (or persistent) >bnodes. As already pointed out in the mailing list, the option a) >has some counterintuitive implications that, in my opinion, make it >unsuitable for a proper formalisation of SPARQL. >Consider the following example: > >Data: > _:a :p _:b > _:b :url <http://example.org> > >Query: > SELECT * { ?x ?p ?y } > >Answer set with option a): > ?x/_:a ?p/:p ?y/_:b > ?x/_:b ?p/:p ?y/_:a > ?x/_:b ?p/:url ?y/<http://example.org> > ?x/_:a ?p/:url ?y/<http://example.org> > ?x/_:b ?p/:url ?y/_:a > ?x/_:a ?p/:url ?y/_:b > >The problem is that, even if the bnode names are taken from the >graph, their names in the left and right hand sides of the >entailment don't matter. Sigh. Yes, you are right. Simply specifying that the bnodes in the answer occur in the graph solves the 'infinitely many answers' problem, but it still allows a great deal of redundancy, probably an unacceptable amount, in the answer set. An alternative way to eliminate this would be directly: to treat this as a necessary condition on answer bindings, but also to impose a(n optional) condition of nonredundancy on answer sets. The weakest form would be to allow servers to not provide redundant answers, rather than require them to eliminate all redundancies, which could get impossibly onerous for large graphs. (An answer binding B is redundant if B(Q) has an instance in the answer set.) Or, we could follow the path you suggest below. I have some problems with that, however. >Option b) doesn't have this problem because G is considered together >with the BGP on the right hand side. Roughly speaking, in option b) >reusing a bnode in the "wrong" place would create an additional >co-reference not present in the original graph. > >I don't think we have really two options, since I don't see why >anybody would consider option a) behaviour the "right" one. > >Note that the fact that this has nothing to do with the told bnodes >issue, which is somewhat orthogonal. Aside, with the b) approach >told bnodes come for free. > >Now, our definition is slightly different (see Enrico's ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0000>), >being (in our opinion) the simpler: > > G (simply) entails S(G RDFmerge P) > >Apparently, it's different from Pat's > > G (simply) entails (G union S(M(P))) > >but in fact they're the same. Agreed. The choice is therefore pedagogical, to find the simplest way to say it. Unfortunately, all ways of saying it require some standard terms to be re-defined, or re-used in new circumstances. My way requires that the merging operation be 'taken apart' into the bnode-separation mapping M and the unioning of the results, since the variable binding has to be applied in the middle. In order that readers fully understand this, some exegesis would probably be necessary. Your way requires that we re-define all the graph operations on graph patterns, since RDF merge at present isn't defined for patterns (and your merge is order-sensitive, which the standard one isn't). Either way, readers have to think about this stuff a lot harder than I think it is wise to ask them to. >To see it, consider that (G union S(M(P))) is (syntactically) equal >to S(G union M(P)) Well, it would be if that were well-defined, but unfortunately it isn't. I agree, all these are tiny pedantic points, but we need to get them right. >because G doesn't contain any variable. Moreover, (G union M(P)) is >equivalent (up to bnodes renaming) to (G RDFmerge P) because of the >definition of RDF merge. > >If we're not interested in bnode persistency, then the RDF merge as >defined in [RDF-MT] is enough (there's a minor detail in the fact >that P contains variable nodes) Minor, maybe ... >. Our definition of RDF merge takes into account that bnodes in the >dataset shouldn't be changed. ... but that is not so minor. Again, we have to go back and alter a definition in an earlier spec, which I am very nervous about doing. > >I hope that this email clarifies our position on the BGP matching issue. Yes, thanks, and you make some good points. Pat > >Cheers, >--sergio -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 2 January 2006 20:25:47 UTC