Re: SPARQL semantics: open issues for basic query patterns

>Hi all.
>
>From the latest DAWG minutes
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0384>:
>
>""" PROPOSED: that http://www.w3.org/2001/sw/DataAccess/rq23/ 1.596
>(section 2, @@NO TOLD BNODES options) addresses rdfSemantics, and that
>it's sufficient to postpone owlDisjunction, contingent on acceptance
>(including silence) by UMD and Free University of Bozen- Bolzano in a
>week (by 27 Dec). ACTION PatH, DanC to respond to PFPS, Horrocks,
>et. al.  """
>
>FUB still wants to clarify the following points:
>
>a) in the document only 'simple entailment' is used. We want a
>parametric entailment, with simple, rdf, rdfs explicit at least, and
>owl-dl and owl possible.

My understanding of the situation is this. SPARQL is required to be 
an actual language, not a family of languages. Given this simple, but 
very basic, point, then if SPARQL itself is to support 'parametric 
entailment', then SPARQL will presumably need to provide for ways of 
communicating the value of this 'parameter' in a query: or failing 
this, at the very least, to specify some universally agreed-upon 
conventions by which queries and servers can agree on which 
entailment is intended when a query is answered. This is conceptually 
simple, but it is not simple in practice as it (1) involves changes 
to every part of the spec, including the entire corpus of test cases 
and probably the protocol (2) exceeds the WG charter (3) would 
eliminate from consideration any kind of non-named entailment (eg 
simple entailment with datatyped literals, or RDF entailment with a 
unique name assumption, or OWL plus rules), but we know that such 
'unnamed extensions' are already in use, and moreover IMO should be 
encouraged rather then discouraged; and finally (4) would delay the 
deployment of SPARQL by months, perhaps years. In the meantime, the 
use of simple entailment in the spec keeps to a single basic 
language, supports almost all actual practical uses of SPARQL which 
are immediately planned, allows for simple implementations to be 
rapidly deployed, allows for many extensions by treating source 
graphs as 'virtual closures', provides a simple and useful basis for 
experimentation with alternative entailment styles, and provides a 
clear path for future extensions to SPARQL which replace simple 
entailment by more sophisticated entailments. On balance, therefore, 
it seems best to keep this release of SPARQL simple by specifying one 
fixed kind of entailment: and the only rational choice, under those 
circumstances, is simple entailment.

>The argument here is that due to the infinite
>closure of RDF graphs (due to rdf:1, rdf:2, etc; or to the
>reification), this document would not even allow to have
>implementations that comply with the original RDF MT!

Of course it would. The closures defined in the RDF MT document are 
infinite, but this is trivial to correct, and Herman ter Horst has 
already published the relevant finite-closure theorems for RDF and 
RDFS entailment. And in any case, the spec does not require that the 
target graph be finite.

>Moreover, there
>are explicit requests about this in the SWBP WG, for example
><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Dec/0072>.
>
>b) we want back the ability to label bnodes in a query as "told-
>bnodes",

The WG, after considerable discussion, decided not to do this. As a 
party to the discussions (and one of those arguing for the ability), 
allow me to explain this decision. First, leave aside for the moment 
the complexities of how best to define the various ordered-merge or 
partial-merge operations; let me just observe that they are, indeed, 
complex. They require the spec to go into a level of syntactic 
micro-management which approaches the level of detail used when 
specifying an algorithm, rather than a 'clean' semantic 
specification; and they depend on one another, so that changing any 
one definition, even slightly, requires changes to be made elsewhere. 
They are fragile. Second observation: the net effect of all this 
complexity is simply to create three distinct categories of 
identifier (IRIs, told-bnodes and normal bnodes) where we previously 
had two. The only practical function of the new category of 
identifier is that it retains its meaning beyond the scope of the 
graph in which it occurs, and is otherwise exactly like a bnode. But 
now think about this logically: this is *exactly* what a logical 
constant, in our case an IRI, does. In fact, we can think of names as 
being 'globally' quantified existential variables: there is no 
semantic distinction. Why, then, should we not simply use IRIs? Then 
a query service can, in effect, skolemize its graphs when answering 
queries, by substituting new IRIs for graph bnodes, as a way to 
'indicate' that further queries are acceptable concerning the entity 
in question. Or not: the choice is up to the server, and the spec can 
leave it entirely open and allow implementations flexibility in this 
regard. Bnodes supplied as bindings cannot be used as 'told bnodes' 
in subsequent queries; IRIs, of course, can. This simple convention 
has some arguments against it, but it has overwhelming arguments in 
its favor. It entirely does away with the need to provide a SPARQL 
syntax for 'told bnodes', and more significantly still it completely 
does away with the impenetrable, unintuitive elaboration of 
definitional support that is needed to make the very idea of 'told 
bnodes' coherent; and, best of all, it makes perfect sense and fits 
within the assumptions and conventions already in use. In fact, seen 
from this perspective, one can view the entire 'told bnode' idea as a 
kind of "optimization" which would allow a server to not actually 
perform the skolemization in some cases. But this "optimization" 
seems trivial, and can only be bought at a very high cost. Even 
though I have a theoretician's fondness for the 'G entails (G union 
B(M(Q)) )' trick, in my experience it has to be carefully explained 
to anyone who tries to read it; which is a very bad sign, when 
writing a spec intended for broad publication. Whereas 'G entails 
B(Q)' is simple, clear, intuitive and connects directly with 
terminology used in earlier specifications.

>  in order to allow, e.g., for the use case "Publishing on the
>Web" in <http://lists.w3.org/Archives/Public/public-rdf-dawg/
>2005JulSep/0430>; also in the SWBP WG there are several requests to allow
>for this, for example
><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0176>.
>
>c) we have to solve the problem about querying empty graphs:
>see
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0386>
>and
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0388>.
>The argument here is that now we have embraced seriously the notion of
>entailment, and therefore the RDF MT has to be taken seriously; please
>note that still the use of simple entailment allows to satisfy Pat's
>argument.

I would defend the null-graph/null-answer behavior for any kind of 
entailment, in a query protocol. The purpose of querying a graph is 
to discover what information is in the graph. There is no information 
in an empty graph. Hence the global requirement that answer bindings 
must be identifiers which actually occur in the graph.

The current definitions require that a ground tautology be answered 
'yes' but that an isomorphic query with a variable fail, when using 
non-simple entailment, e.g with rdfs entailment against the empty 
graph,

ASK { rdfs:Class rdf:type rdfs:Class }

should succeed, but

SELECT * {?x rdf:type rdfs:Class }

will fail (have no answers). But care is needed: this query really is 
against the empty graph. The RDFS closure (as defined in the RDF MT 
document) of the empty graph would contain all the tautologies, 
including this triple, so that

SELECT [rdfs-closure *] {?x rdf:type rdfs:Class }

will succeed even with simple entailment, with quite a lot of 
answers, all in the rdf: and rdfs: namespaces, including 
?x/rdfs:Class. So, what answers you get depends on what the server 
claims to have in its graph. Options include a source graph G (in our 
case, empty), some kind of closure of G, some kind of closure of G 
but without all the tautologies, a filtered closure of G which omits 
all triples which share no vocabulary with G, perhaps a filtered 
closure of G further filtered to remove everything other than triples 
whose property is rdfs:subClassOf - i.e., a complete subclass 
hierarchy of G - etc., etc.. No doubt the choices on offer will be 
determined more by pragmatic concerns of utility than 
theoretical/logical tidiness, which IMO is exactly as it should be.

>d) There should be a mention to the possibility to have systems that
>satisfy the "interoperability" requirement (by PFPS):
>- answers to equivalent graphs should be the same.

I have already responded to this. One has to be clear what 
"equivalent" means. Using the definition cited by PFPS, equivalence 
refers to isomorphism under renaming of bnodeIDs, and then answers 
are also the same, up to renaming of bnodeIDs. So we do satisfy this 
requirement, as I understand it.

Note that with this notion of equivalence, a non-lean graph will not 
be equivalent to its lean proper subgraph.

Pat

>Please note that we can leave the implementors free to choose any
>method he/she desires to satisfy this (optional) requirement, for
>example using the notion of "minimality" as we were proposing some
>time ago. With Pat we agreed that there may be alternative ways to
>achieve this requirement, so we don't want to fix this.
>
>cheers
>--e.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Monday, 2 January 2006 19:19:26 UTC