- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 17 Jan 2006 10:42:45 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>On 15 Jan 2006, at 19:01, Seaborne, Andy wrote: > >>Enrico Franconi wrote: >> >>>The new proposal of Pat >>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0061.html> >>>does not work for any approach where bnodes >>>are implicit, and this happens not only for >>>OWL-Lite entailment, as we already pointed out >>>in >>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0064.html>, >>>but also for RDF entailment. For example, >>>given the graph >>> :john :age "25"^^xsd:decimal . >>>the query >>> ASK { _:b rdf:type rdf:XMLLiteral } >>>should clearly return YES if RDF entailment is considered. I agree, and see below. >>>However, >>>according to the latest proposal from Pat the >>>answer to the query would be NO regardless of >>>the type of entailment considered. The reason >>>is that Pat considers bnodes in the query >>>restricted to be bound only to URIs and bnodes >>>explicitly appearing in the graph. This restriction is to terms occurring in the scoping graph, not in the original graph. For the SPARQL case of simple entailment, the scoping graph is defined to be equivalent to the dataset graph, but for RDF or RDFS entailment, the appropriate adjustment would be to define the scoping graph G' to be the RDF or RDFS closure of the data set graph G. The only general semantic condition is that the scoping graph must be entailment-equivalent to, i.e. entail and be entailed by, the dataset graph, using whatever is the appropriate notion of entailment. The only purpose of including this graph in the definition, bear in mind, is to be a 'context' which restricts and organizes the answer bnode bindings, and to minimize redundancy. But in any case, if an adjustment to the definition of scoping is required, it can be made appropriately, perhaps by relaxing the condition that bnodes bound to variables must actually occur in G'. I don't think this strict condition is going to work for OWL in any case, certainly not for OWL/RDF. Such adjustments are purely pragmatic, and do not reflect any underlying semantic change. The main point regarding bnodes in answer patterns is an operational one. If an engine is capable of determining that an existential pattern is entailed, then it is surely capable of supplying a bnode binding to a variable in the place of the existential. To not do so would be gratuitously misleading as a query answer, since to supply such a binding is an almost cost-free operation. It seems to me that this principle should apply to queries relative to any language which uses bnodes/existentials. To refuse to supply an existential name as a binding, and report 'no answer', when an existential is entailed, is at least as misleading as supplying an existential when there is a real name in the database. This seems to me to be so misleading, in fact, that we should forbid it. The suggested 'principle' is that if an query pattern Q containing a bnode succeeds, then the similar query pattern with a new query variable in place of the bnode must also succeed. The justification being that the engine can generate a new bnodeID to substitute for the variable, under the exact same circumstances that it can establish that the existential is entailed. (The work needed to establish entailment, in the 'awkward' cases, so much exceeds the work required to generate a new bnodeID that the latter seems hardly worth arguing about.) SPARQL satisfies this trivially, but I suggest that we impose it as a requirement on all extensions. >>This hinges on the different definitions of 'pattern solution'. >>The scoping >>graph proposal maps variables and bNodes; the ordered-merge proposal maps >>just variables. Each has consequences. > >The main consequence of Pat's latest proposal, >as pointed out by FUB, is that SPARQL can never >be extended - not even to RDF entailment - as >you can see from the example above. "Never be" is a very strong claim, and in this case false. The notion of 'scoping graph' used in the proposal can be used as a parameter to adjust the definition to suit. The scoping graph provides the scope for the answer bindings. For more elaborate forms of entailment which allow entailments of existentials, this scope should be allowed to include terms which represent the extra objects whose existence is entailed. >Moreover, if you want to map both variables and >bnodes, you better disallow bnodes in the query >and consider only variables in the query: there >wouldn't be any noticeable difference. The fact that bnodes and variables are almost identical in function, so that bnodes in query patterns are 'blank variables', seems to me to be a feature rather than a bug. This corresponds exactly to the basic semantical picture in which a query is a sentence on the RHS of a sequent, i.e. a goal to be proved: a sentence being entailed rather than a sentence being asserted. >>If bNodes are bound by pattern solutions, then the algebra works out as per >>the current document. > >And bnodes would behave *exactly* as variables, >and therefore when considering the extension of >SPARQL with just RDF entailment, the semantics >would not be compatible, and the behaviour would >be wrong. All, and I really do mean ALL, of these debates are concerned with syntactic devices whose only purpose is to restrict the answer set in order to avoid redundancy. This is a significant matter in query language design, but it has nothing to do with semantics. The only genuinely semantic constraint is that answer bindings should make the query pattern into something that is appropriately entailed by the dataset graph. All the rest of this entire discussion has been concerned with tweaking this basic, clear, semantic idea so as to restrict the answer set by eliminating irrelevant answers which are entailed, but which add no new information, i.e. are redundant. My own view is that the costs and risks of trying to build appropriate such restrictions into a single definition of 'answer', which is supposed to survive unchanged through all possible entailment modes, are too high to justify this building-in, and that what we should do in stating the general extendable SPARQL conditions is, as suggested in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0016, to separate out the logically necessary requirement on answers which are universally agreed, clearly semantically motivated, and have a simple 'entail' parameter, so can be used without any change all the way from RDF to higher-order logic from the conditions on answer sets which are there only to remove or minimize redundancy: and that we should document and explain this distinction between the different roles and purposes of the different parts of the definition, as a guide to future work. Who knows, some enterprising person might invent a clever hash-coding algorithm which can completely eliminate redundancy in answer sets in, say, log-n time in all practical cases, and put it in the public domain. If so, great, IMO: why should we define such a possibility out of consideration? (Which we do with the current definitions.) The point being that redundancy elimination is itself easy to state 'semantically' as well, but (we all recognize) this easy definition defines a condition which is too onerous a computational burden to impose on developers and users right now. But this is not an implementational/pragmatic issue, not a semantic one: the semantic definition of 'less redundant' is about as clear and simple as it can get. So, let us not impose it or try to build it into the definitions, but instead recommend (using 'should/should not' language) that it be approached as closely as reasonably, and leave it to implementers to make incremental improvements in how close they can get. The thing we are straining to get defined here is really just a specification of a particular implementational strategy for keeping redundancy to a minimum, and has nothing to do with semantics. The absolute worst that can happen, under this proposal, is that two conforming engines will produce answer sets one of which will have some redundancies in it relative to the other. The work involved in checking whether two answer sets are equivalent under this condition is not vastly greater than that involved in checking whether they are the same under the current definition, since we do not specify a canonical ordering. None of the redundant answers will be wrong or semantically misleading: all the bnodes will be properly scoped. This will also completely meet Peter's 'interoperability' wishes, if only in the ideal case, since entailment-minimal answer sets for logically equivalent datasets will always be graph-equivalent, for any notion of entailment. Users who are able and willing to check for redundancies in answer sets will be able to to do so, and those who do will get to the same answer set from any conforming engine. (Note, with our current definitions, in any of their incarnations, this is false. Completely redundancy-free sets of answer bindings will not in general be conforming answer sets, even when the dataset is lean.) In practice, any serious SPARQL query service will not generate gratuitously redundant answers, even if it could 'legally' do so, since there are no Web-evolutionary pressures which lead in that direction: such pathological behavior would be to both the server's and the client's disadvantage. The only objection to this idea that I have heard seems to be based on the feeling that there should be a single defined, official, unique set of answers. That idea makes sense for traditional data querying, but traditionally this kind of redundancy didn't come up. When, as here, both queries and datasets can legally have internal redundancy and so be logically equivalent to different queries and graphs, it seems silly to be sweating blood over how to ensure that answer sets can have just the 'right' amount of redundancy, and getting this defined so precisely that it gives a unique answer set for any graph, query and form of entailment. Whatever we do, we are almost bound to get it slightly wrong I have no confidence that any of our suggested definitions will go on working appropriately for OWL and we don't, I submit, need to get it exactly right. Worse, in advanced cases, we don't even know what really counts as 'exactly right', because the appropriate answer isn't determined by theoretical or semantic considerations, but by pragmatic ones. RDFS tautologies are certainly RDFS-entailed by the empty graph, for sure. So, should queries on the empty graph produce tautological bindings when used with RDFS entailment? Maybe, maybe not: I can see arguments both ways. If we leave the definition of 'redundant' open enough to allow conforming engines to make the choice, then I see that as a small success for SPARQL, rather than a failure. (BTW, this is an argument for the 'scoping graph' idea, as it provides an extra adjustment parameter.) So, I suggest that as a compromise that we define SPARQL answers using the 'bnode-binding' modification to the Baden-Baden definitions for the SPARQL case of simple entailment, but also state the more general, purely entailment-based, definitions as conditions which must be satisfied by any SPARQL extension to other entailment modes, and perhaps also add some informative remarks about appropriate strategies for keeping redundancy to a minimum. Seems to me that this keeps just the right amount of flexibility open for SPARQL-N (N>1) compliance, while keeping our current design deterministic and constrained, as Bijan wants it to be. >>If bNodes are not bound at all, as in the ordered-merge proposal, then there >>is a change to the algebra. > >Unless you forbid the use of bnodes in queries, >which wouldn't be at all a loss, since the >behaviour you want for bnodes is the behaviour >we have for variables. > >>(...) >> >>The algebra works if a bNode in occurring in two or more BGPs in a graph >>pattern is bound by the pattern solution [*]. > >Exactly: you expect bnodes to behave like >variables, which is semantically wrong for any >type of entailment which is not just the reading >of the syntax (like simple entailment). I disagree that it is semantically wrong: on the contrary, thinking strictly about semantics, I cannot see that any other design makes semantic sense. Semantically, a query variable is a request for information about things that exist, and a bnode asserts existence. If we allow bnodes as bindings, then this gives information about the existence of a thing. This is exactly the same information provided by a YES to an existential query. Syntactic restrictions on bindings to terms in a graph are not by any stretch of the imagination properly called semantic restrictions. The great merit of this bnode/variable scoping idea, it seems to me, is its simplicity, which includes its semantic simplicity. Semantically, bnodes - existentials - are the logically clear idea. Variables have no actual logical semantics at all, unless they are thought of as "labelled existentials". So the best way to think about it logically is that variables are being treated as bnodes (with trackable bindings), rather then vice versa. Pat > >>This is a conservative approach and, like any extension to the level of >>entailment, more solutions might become >>possible when relaxed. It would allow >>all entailments for queries consisting of a >>single BGP. It would leave open the >>possibility of the rdfD1 use case above as an >>extension. > >This is acceptable only if bnodes are forbidden in queries. > >cheers >--e. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 17 January 2006 16:43:04 UTC