- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 2 Jan 2006 13:19:14 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>Hi all.
>
>From the latest DAWG minutes
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0384>:
>
>""" PROPOSED: that http://www.w3.org/2001/sw/DataAccess/rq23/ 1.596
>(section 2, @@NO TOLD BNODES options) addresses rdfSemantics, and that
>it's sufficient to postpone owlDisjunction, contingent on acceptance
>(including silence) by UMD and Free University of Bozen- Bolzano in a
>week (by 27 Dec). ACTION PatH, DanC to respond to PFPS, Horrocks,
>et. al. """
>
>FUB still wants to clarify the following points:
>
>a) in the document only 'simple entailment' is used. We want a
>parametric entailment, with simple, rdf, rdfs explicit at least, and
>owl-dl and owl possible.
My understanding of the situation is this. SPARQL is required to be
an actual language, not a family of languages. Given this simple, but
very basic, point, then if SPARQL itself is to support 'parametric
entailment', then SPARQL will presumably need to provide for ways of
communicating the value of this 'parameter' in a query: or failing
this, at the very least, to specify some universally agreed-upon
conventions by which queries and servers can agree on which
entailment is intended when a query is answered. This is conceptually
simple, but it is not simple in practice as it (1) involves changes
to every part of the spec, including the entire corpus of test cases
and probably the protocol (2) exceeds the WG charter (3) would
eliminate from consideration any kind of non-named entailment (eg
simple entailment with datatyped literals, or RDF entailment with a
unique name assumption, or OWL plus rules), but we know that such
'unnamed extensions' are already in use, and moreover IMO should be
encouraged rather then discouraged; and finally (4) would delay the
deployment of SPARQL by months, perhaps years. In the meantime, the
use of simple entailment in the spec keeps to a single basic
language, supports almost all actual practical uses of SPARQL which
are immediately planned, allows for simple implementations to be
rapidly deployed, allows for many extensions by treating source
graphs as 'virtual closures', provides a simple and useful basis for
experimentation with alternative entailment styles, and provides a
clear path for future extensions to SPARQL which replace simple
entailment by more sophisticated entailments. On balance, therefore,
it seems best to keep this release of SPARQL simple by specifying one
fixed kind of entailment: and the only rational choice, under those
circumstances, is simple entailment.
>The argument here is that due to the infinite
>closure of RDF graphs (due to rdf:1, rdf:2, etc; or to the
>reification), this document would not even allow to have
>implementations that comply with the original RDF MT!
Of course it would. The closures defined in the RDF MT document are
infinite, but this is trivial to correct, and Herman ter Horst has
already published the relevant finite-closure theorems for RDF and
RDFS entailment. And in any case, the spec does not require that the
target graph be finite.
>Moreover, there
>are explicit requests about this in the SWBP WG, for example
><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Dec/0072>.
>
>b) we want back the ability to label bnodes in a query as "told-
>bnodes",
The WG, after considerable discussion, decided not to do this. As a
party to the discussions (and one of those arguing for the ability),
allow me to explain this decision. First, leave aside for the moment
the complexities of how best to define the various ordered-merge or
partial-merge operations; let me just observe that they are, indeed,
complex. They require the spec to go into a level of syntactic
micro-management which approaches the level of detail used when
specifying an algorithm, rather than a 'clean' semantic
specification; and they depend on one another, so that changing any
one definition, even slightly, requires changes to be made elsewhere.
They are fragile. Second observation: the net effect of all this
complexity is simply to create three distinct categories of
identifier (IRIs, told-bnodes and normal bnodes) where we previously
had two. The only practical function of the new category of
identifier is that it retains its meaning beyond the scope of the
graph in which it occurs, and is otherwise exactly like a bnode. But
now think about this logically: this is *exactly* what a logical
constant, in our case an IRI, does. In fact, we can think of names as
being 'globally' quantified existential variables: there is no
semantic distinction. Why, then, should we not simply use IRIs? Then
a query service can, in effect, skolemize its graphs when answering
queries, by substituting new IRIs for graph bnodes, as a way to
'indicate' that further queries are acceptable concerning the entity
in question. Or not: the choice is up to the server, and the spec can
leave it entirely open and allow implementations flexibility in this
regard. Bnodes supplied as bindings cannot be used as 'told bnodes'
in subsequent queries; IRIs, of course, can. This simple convention
has some arguments against it, but it has overwhelming arguments in
its favor. It entirely does away with the need to provide a SPARQL
syntax for 'told bnodes', and more significantly still it completely
does away with the impenetrable, unintuitive elaboration of
definitional support that is needed to make the very idea of 'told
bnodes' coherent; and, best of all, it makes perfect sense and fits
within the assumptions and conventions already in use. In fact, seen
from this perspective, one can view the entire 'told bnode' idea as a
kind of "optimization" which would allow a server to not actually
perform the skolemization in some cases. But this "optimization"
seems trivial, and can only be bought at a very high cost. Even
though I have a theoretician's fondness for the 'G entails (G union
B(M(Q)) )' trick, in my experience it has to be carefully explained
to anyone who tries to read it; which is a very bad sign, when
writing a spec intended for broad publication. Whereas 'G entails
B(Q)' is simple, clear, intuitive and connects directly with
terminology used in earlier specifications.
> in order to allow, e.g., for the use case "Publishing on the
>Web" in <http://lists.w3.org/Archives/Public/public-rdf-dawg/
>2005JulSep/0430>; also in the SWBP WG there are several requests to allow
>for this, for example
><http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0176>.
>
>c) we have to solve the problem about querying empty graphs:
>see
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0386>
>and
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0388>.
>The argument here is that now we have embraced seriously the notion of
>entailment, and therefore the RDF MT has to be taken seriously; please
>note that still the use of simple entailment allows to satisfy Pat's
>argument.
I would defend the null-graph/null-answer behavior for any kind of
entailment, in a query protocol. The purpose of querying a graph is
to discover what information is in the graph. There is no information
in an empty graph. Hence the global requirement that answer bindings
must be identifiers which actually occur in the graph.
The current definitions require that a ground tautology be answered
'yes' but that an isomorphic query with a variable fail, when using
non-simple entailment, e.g with rdfs entailment against the empty
graph,
ASK { rdfs:Class rdf:type rdfs:Class }
should succeed, but
SELECT * {?x rdf:type rdfs:Class }
will fail (have no answers). But care is needed: this query really is
against the empty graph. The RDFS closure (as defined in the RDF MT
document) of the empty graph would contain all the tautologies,
including this triple, so that
SELECT [rdfs-closure *] {?x rdf:type rdfs:Class }
will succeed even with simple entailment, with quite a lot of
answers, all in the rdf: and rdfs: namespaces, including
?x/rdfs:Class. So, what answers you get depends on what the server
claims to have in its graph. Options include a source graph G (in our
case, empty), some kind of closure of G, some kind of closure of G
but without all the tautologies, a filtered closure of G which omits
all triples which share no vocabulary with G, perhaps a filtered
closure of G further filtered to remove everything other than triples
whose property is rdfs:subClassOf - i.e., a complete subclass
hierarchy of G - etc., etc.. No doubt the choices on offer will be
determined more by pragmatic concerns of utility than
theoretical/logical tidiness, which IMO is exactly as it should be.
>d) There should be a mention to the possibility to have systems that
>satisfy the "interoperability" requirement (by PFPS):
>- answers to equivalent graphs should be the same.
I have already responded to this. One has to be clear what
"equivalent" means. Using the definition cited by PFPS, equivalence
refers to isomorphism under renaming of bnodeIDs, and then answers
are also the same, up to renaming of bnodeIDs. So we do satisfy this
requirement, as I understand it.
Note that with this notion of equivalence, a non-lean graph will not
be equivalent to its lean proper subgraph.
Pat
>Please note that we can leave the implementors free to choose any
>method he/she desires to satisfy this (optional) requirement, for
>example using the notion of "minimality" as we were proposing some
>time ago. With Pat we agreed that there may be alternative ways to
>achieve this requirement, so we don't want to fix this.
>
>cheers
>--e.
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 2 January 2006 19:19:26 UTC