- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 2 Jan 2006 13:19:14 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>Hi all. > >From the latest DAWG minutes ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0384>: > >""" PROPOSED: that http://www.w3.org/2001/sw/DataAccess/rq23/ 1.596 >(section 2, @@NO TOLD BNODES options) addresses rdfSemantics, and that >it's sufficient to postpone owlDisjunction, contingent on acceptance >(including silence) by UMD and Free University of Bozen- Bolzano in a >week (by 27 Dec). ACTION PatH, DanC to respond to PFPS, Horrocks, >et. al. """ > >FUB still wants to clarify the following points: > >a) in the document only 'simple entailment' is used. We want a >parametric entailment, with simple, rdf, rdfs explicit at least, and >owl-dl and owl possible. My understanding of the situation is this. SPARQL is required to be an actual language, not a family of languages. Given this simple, but very basic, point, then if SPARQL itself is to support 'parametric entailment', then SPARQL will presumably need to provide for ways of communicating the value of this 'parameter' in a query: or failing this, at the very least, to specify some universally agreed-upon conventions by which queries and servers can agree on which entailment is intended when a query is answered. This is conceptually simple, but it is not simple in practice as it (1) involves changes to every part of the spec, including the entire corpus of test cases and probably the protocol (2) exceeds the WG charter (3) would eliminate from consideration any kind of non-named entailment (eg simple entailment with datatyped literals, or RDF entailment with a unique name assumption, or OWL plus rules), but we know that such 'unnamed extensions' are already in use, and moreover IMO should be encouraged rather then discouraged; and finally (4) would delay the deployment of SPARQL by months, perhaps years. In the meantime, the use of simple entailment in the spec keeps to a single basic language, supports almost all actual practical uses of SPARQL which are immediately planned, allows for simple implementations to be rapidly deployed, allows for many extensions by treating source graphs as 'virtual closures', provides a simple and useful basis for experimentation with alternative entailment styles, and provides a clear path for future extensions to SPARQL which replace simple entailment by more sophisticated entailments. On balance, therefore, it seems best to keep this release of SPARQL simple by specifying one fixed kind of entailment: and the only rational choice, under those circumstances, is simple entailment. >The argument here is that due to the infinite >closure of RDF graphs (due to rdf:1, rdf:2, etc; or to the >reification), this document would not even allow to have >implementations that comply with the original RDF MT! Of course it would. The closures defined in the RDF MT document are infinite, but this is trivial to correct, and Herman ter Horst has already published the relevant finite-closure theorems for RDF and RDFS entailment. And in any case, the spec does not require that the target graph be finite. >Moreover, there >are explicit requests about this in the SWBP WG, for example ><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Dec/0072>. > >b) we want back the ability to label bnodes in a query as "told- >bnodes", The WG, after considerable discussion, decided not to do this. As a party to the discussions (and one of those arguing for the ability), allow me to explain this decision. First, leave aside for the moment the complexities of how best to define the various ordered-merge or partial-merge operations; let me just observe that they are, indeed, complex. They require the spec to go into a level of syntactic micro-management which approaches the level of detail used when specifying an algorithm, rather than a 'clean' semantic specification; and they depend on one another, so that changing any one definition, even slightly, requires changes to be made elsewhere. They are fragile. Second observation: the net effect of all this complexity is simply to create three distinct categories of identifier (IRIs, told-bnodes and normal bnodes) where we previously had two. The only practical function of the new category of identifier is that it retains its meaning beyond the scope of the graph in which it occurs, and is otherwise exactly like a bnode. But now think about this logically: this is *exactly* what a logical constant, in our case an IRI, does. In fact, we can think of names as being 'globally' quantified existential variables: there is no semantic distinction. Why, then, should we not simply use IRIs? Then a query service can, in effect, skolemize its graphs when answering queries, by substituting new IRIs for graph bnodes, as a way to 'indicate' that further queries are acceptable concerning the entity in question. Or not: the choice is up to the server, and the spec can leave it entirely open and allow implementations flexibility in this regard. Bnodes supplied as bindings cannot be used as 'told bnodes' in subsequent queries; IRIs, of course, can. This simple convention has some arguments against it, but it has overwhelming arguments in its favor. It entirely does away with the need to provide a SPARQL syntax for 'told bnodes', and more significantly still it completely does away with the impenetrable, unintuitive elaboration of definitional support that is needed to make the very idea of 'told bnodes' coherent; and, best of all, it makes perfect sense and fits within the assumptions and conventions already in use. In fact, seen from this perspective, one can view the entire 'told bnode' idea as a kind of "optimization" which would allow a server to not actually perform the skolemization in some cases. But this "optimization" seems trivial, and can only be bought at a very high cost. Even though I have a theoretician's fondness for the 'G entails (G union B(M(Q)) )' trick, in my experience it has to be carefully explained to anyone who tries to read it; which is a very bad sign, when writing a spec intended for broad publication. Whereas 'G entails B(Q)' is simple, clear, intuitive and connects directly with terminology used in earlier specifications. > in order to allow, e.g., for the use case "Publishing on the >Web" in <http://lists.w3.org/Archives/Public/public-rdf-dawg/ >2005JulSep/0430>; also in the SWBP WG there are several requests to allow >for this, for example ><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0176>. > >c) we have to solve the problem about querying empty graphs: >see ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0386> >and ><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0388>. >The argument here is that now we have embraced seriously the notion of >entailment, and therefore the RDF MT has to be taken seriously; please >note that still the use of simple entailment allows to satisfy Pat's >argument. I would defend the null-graph/null-answer behavior for any kind of entailment, in a query protocol. The purpose of querying a graph is to discover what information is in the graph. There is no information in an empty graph. Hence the global requirement that answer bindings must be identifiers which actually occur in the graph. The current definitions require that a ground tautology be answered 'yes' but that an isomorphic query with a variable fail, when using non-simple entailment, e.g with rdfs entailment against the empty graph, ASK { rdfs:Class rdf:type rdfs:Class } should succeed, but SELECT * {?x rdf:type rdfs:Class } will fail (have no answers). But care is needed: this query really is against the empty graph. The RDFS closure (as defined in the RDF MT document) of the empty graph would contain all the tautologies, including this triple, so that SELECT [rdfs-closure *] {?x rdf:type rdfs:Class } will succeed even with simple entailment, with quite a lot of answers, all in the rdf: and rdfs: namespaces, including ?x/rdfs:Class. So, what answers you get depends on what the server claims to have in its graph. Options include a source graph G (in our case, empty), some kind of closure of G, some kind of closure of G but without all the tautologies, a filtered closure of G which omits all triples which share no vocabulary with G, perhaps a filtered closure of G further filtered to remove everything other than triples whose property is rdfs:subClassOf - i.e., a complete subclass hierarchy of G - etc., etc.. No doubt the choices on offer will be determined more by pragmatic concerns of utility than theoretical/logical tidiness, which IMO is exactly as it should be. >d) There should be a mention to the possibility to have systems that >satisfy the "interoperability" requirement (by PFPS): >- answers to equivalent graphs should be the same. I have already responded to this. One has to be clear what "equivalent" means. Using the definition cited by PFPS, equivalence refers to isomorphism under renaming of bnodeIDs, and then answers are also the same, up to renaming of bnodeIDs. So we do satisfy this requirement, as I understand it. Note that with this notion of equivalence, a non-lean graph will not be equivalent to its lean proper subgraph. Pat >Please note that we can leave the implementors free to choose any >method he/she desires to satisfy this (optional) requirement, for >example using the notion of "minimality" as we were proposing some >time ago. With Pat we agreed that there may be alternative ways to >achieve this requirement, so we don't want to fix this. > >cheers >--e. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 2 January 2006 19:19:26 UTC