- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 17 Jan 2006 10:42:45 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>On 15 Jan 2006, at 19:01, Seaborne, Andy wrote:
>
>>Enrico Franconi wrote:
>>
>>>The new proposal of Pat
>>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0061.html>
>>>does not work for any approach where bnodes
>>>are implicit, and this happens not only for
>>>OWL-Lite entailment, as we already pointed out
>>>in
>>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0064.html>,
>>>but also for RDF entailment. For example,
>>>given the graph
>>> :john :age "25"^^xsd:decimal .
>>>the query
>>> ASK { _:b rdf:type rdf:XMLLiteral }
>>>should clearly return YES if RDF entailment is considered.
I agree, and see below.
>>>However,
>>>according to the latest proposal from Pat the
>>>answer to the query would be NO regardless of
>>>the type of entailment considered. The reason
>>>is that Pat considers bnodes in the query
>>>restricted to be bound only to URIs and bnodes
>>>explicitly appearing in the graph.
This restriction is to terms occurring in the
scoping graph, not in the original graph. For the
SPARQL case of simple entailment, the scoping
graph is defined to be equivalent to the dataset
graph, but for RDF or RDFS entailment, the
appropriate adjustment would be to define the
scoping graph G' to be the RDF or RDFS closure of
the data set graph G. The only general semantic
condition is that the scoping graph must be
entailment-equivalent to, i.e. entail and be
entailed by, the dataset graph, using whatever is
the appropriate notion of entailment. The only
purpose of including this graph in the
definition, bear in mind, is to be a 'context'
which restricts and organizes the answer bnode
bindings, and to minimize redundancy. But in any
case, if an adjustment to the definition of
scoping is required, it can be made
appropriately, perhaps by relaxing the condition
that bnodes bound to variables must actually
occur in G'. I don't think this strict condition
is going to work for OWL in any case, certainly
not for OWL/RDF. Such adjustments are purely
pragmatic, and do not reflect any underlying
semantic change.
The main point regarding bnodes in answer
patterns is an operational one. If an engine is
capable of determining that an existential
pattern is entailed, then it is surely capable of
supplying a bnode binding to a variable in the
place of the existential. To not do so would be
gratuitously misleading as a query answer, since
to supply such a binding is an almost cost-free
operation. It seems to me that this principle
should apply to queries relative to any language
which uses bnodes/existentials. To refuse to
supply an existential name as a binding, and
report 'no answer', when an existential is
entailed, is at least as misleading as supplying
an existential when there is a real name in the
database. This seems to me to be so misleading,
in fact, that we should forbid it.
The suggested 'principle' is that if an query
pattern Q containing a bnode succeeds, then the
similar query pattern with a new query variable
in place of the bnode must also succeed. The
justification being that the engine can generate
a new bnodeID to substitute for the variable,
under the exact same circumstances that it can
establish that the existential is entailed. (The
work needed to establish entailment, in the
'awkward' cases, so much exceeds the work
required to generate a new bnodeID that the
latter seems hardly worth arguing about.) SPARQL
satisfies this trivially, but I suggest that we
impose it as a requirement on all extensions.
>>This hinges on the different definitions of 'pattern solution'.
>>The scoping
>>graph proposal maps variables and bNodes; the ordered-merge proposal maps
>>just variables. Each has consequences.
>
>The main consequence of Pat's latest proposal,
>as pointed out by FUB, is that SPARQL can never
>be extended - not even to RDF entailment - as
>you can see from the example above.
"Never be" is a very strong claim, and in this
case false. The notion of 'scoping graph' used in
the proposal can be used as a parameter to adjust
the definition to suit. The scoping graph
provides the scope for the answer bindings. For
more elaborate forms of entailment which allow
entailments of existentials, this scope should be
allowed to include terms which represent the
extra objects whose existence is entailed.
>Moreover, if you want to map both variables and
>bnodes, you better disallow bnodes in the query
>and consider only variables in the query: there
>wouldn't be any noticeable difference.
The fact that bnodes and variables are almost
identical in function, so that bnodes in query
patterns are 'blank variables', seems to me to be
a feature rather than a bug. This corresponds
exactly to the basic semantical picture in which
a query is a sentence on the RHS of a sequent,
i.e. a goal to be proved: a sentence being
entailed rather than a sentence being asserted.
>>If bNodes are bound by pattern solutions, then the algebra works out as per
>>the current document.
>
>And bnodes would behave *exactly* as variables,
>and therefore when considering the extension of
>SPARQL with just RDF entailment, the semantics
>would not be compatible, and the behaviour would
>be wrong.
All, and I really do mean ALL, of these debates
are concerned with syntactic devices whose only
purpose is to restrict the answer set in order to
avoid redundancy. This is a significant matter in
query language design, but it has nothing to do
with semantics. The only genuinely semantic
constraint is that answer bindings should make
the query pattern into something that is
appropriately entailed by the dataset graph. All
the rest of this entire discussion has been
concerned with tweaking this basic, clear,
semantic idea so as to restrict the answer set by
eliminating irrelevant answers which are
entailed, but which add no new information, i.e.
are redundant.
My own view is that the costs and risks of trying
to build appropriate such restrictions into a
single definition of 'answer', which is supposed
to survive unchanged through all possible
entailment modes, are too high to justify this
building-in, and that what we should do in
stating the general extendable SPARQL conditions
is, as suggested in
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0016,
to separate out the logically necessary
requirement on answers which are universally
agreed, clearly semantically motivated, and have
a simple 'entail' parameter, so can be used
without any change all the way from RDF to
higher-order logic from the conditions on
answer sets which are there only to remove or
minimize redundancy: and that we should document
and explain this distinction between the
different roles and purposes of the different
parts of the definition, as a guide to future
work. Who knows, some enterprising person might
invent a clever hash-coding algorithm which can
completely eliminate redundancy in answer sets
in, say, log-n time in all practical cases, and
put it in the public domain. If so, great, IMO:
why should we define such a possibility out of
consideration? (Which we do with the current
definitions.) The point being that redundancy
elimination is itself easy to state
'semantically' as well, but (we all recognize)
this easy definition defines a condition which is
too onerous a computational burden to impose on
developers and users right now. But this is not
an implementational/pragmatic issue, not a
semantic one: the semantic definition of 'less
redundant' is about as clear and simple as it can
get. So, let us not impose it or try to build it
into the definitions, but instead recommend
(using 'should/should not' language) that it be
approached as closely as reasonably, and leave it
to implementers to make incremental improvements
in how close they can get. The thing we are
straining to get defined here is really just a
specification of a particular implementational
strategy for keeping redundancy to a minimum, and
has nothing to do with semantics.
The absolute worst that can happen, under this
proposal, is that two conforming engines will
produce answer sets one of which will have some
redundancies in it relative to the other. The
work involved in checking whether two answer sets
are equivalent under this condition is not vastly
greater than that involved in checking whether
they are the same under the current definition,
since we do not specify a canonical ordering.
None of the redundant answers will be wrong or
semantically misleading: all the bnodes will be
properly scoped. This will also completely meet
Peter's 'interoperability' wishes, if only in the
ideal case, since entailment-minimal answer sets
for logically equivalent datasets will always be
graph-equivalent, for any notion of entailment.
Users who are able and willing to check for
redundancies in answer sets will be able to to do
so, and those who do will get to the same answer
set from any conforming engine. (Note, with our
current definitions, in any of their
incarnations, this is false. Completely
redundancy-free sets of answer bindings will not
in general be conforming answer sets, even when
the dataset is lean.) In practice, any serious
SPARQL query service will not generate
gratuitously redundant answers, even if it could
'legally' do so, since there are no
Web-evolutionary pressures which lead in that
direction: such pathological behavior would be to
both the server's and the client's disadvantage.
The only objection to this idea that I have heard
seems to be based on the feeling that there
should be a single defined, official, unique set
of answers. That idea makes sense for traditional
data querying, but traditionally this kind of
redundancy didn't come up. When, as here, both
queries and datasets can legally have internal
redundancy and so be logically equivalent to
different queries and graphs, it seems silly to
be sweating blood over how to ensure that answer
sets can have just the 'right' amount of
redundancy, and getting this defined so precisely
that it gives a unique answer set for any graph,
query and form of entailment. Whatever we do, we
are almost bound to get it slightly wrong I
have no confidence that any of our suggested
definitions will go on working appropriately for
OWL and we don't, I submit, need to get it
exactly right. Worse, in advanced cases, we don't
even know what really counts as 'exactly right',
because the appropriate answer isn't determined
by theoretical or semantic considerations, but by
pragmatic ones. RDFS tautologies are certainly
RDFS-entailed by the empty graph, for sure. So,
should queries on the empty graph produce
tautological bindings when used with RDFS
entailment? Maybe, maybe not: I can see arguments
both ways. If we leave the definition of
'redundant' open enough to allow conforming
engines to make the choice, then I see that as a
small success for SPARQL, rather than a failure.
(BTW, this is an argument for the 'scoping graph'
idea, as it provides an extra adjustment
parameter.)
So, I suggest that as a compromise that we define
SPARQL answers using the 'bnode-binding'
modification to the Baden-Baden definitions for
the SPARQL case of simple entailment, but also
state the more general, purely entailment-based,
definitions as conditions which must be satisfied
by any SPARQL extension to other entailment
modes, and perhaps also add some informative
remarks about appropriate strategies for keeping
redundancy to a minimum. Seems to me that this
keeps just the right amount of flexibility open
for SPARQL-N (N>1) compliance, while keeping our
current design deterministic and constrained, as
Bijan wants it to be.
>>If bNodes are not bound at all, as in the ordered-merge proposal, then there
>>is a change to the algebra.
>
>Unless you forbid the use of bnodes in queries,
>which wouldn't be at all a loss, since the
>behaviour you want for bnodes is the behaviour
>we have for variables.
>
>>(...)
>>
>>The algebra works if a bNode in occurring in two or more BGPs in a graph
>>pattern is bound by the pattern solution [*].
>
>Exactly: you expect bnodes to behave like
>variables, which is semantically wrong for any
>type of entailment which is not just the reading
>of the syntax (like simple entailment).
I disagree that it is semantically wrong: on the
contrary, thinking strictly about semantics, I
cannot see that any other design makes semantic
sense. Semantically, a query variable is a
request for information about things that exist,
and a bnode asserts existence. If we allow bnodes
as bindings, then this gives information about
the existence of a thing. This is exactly the
same information provided by a YES to an
existential query. Syntactic restrictions on
bindings to terms in a graph are not by any
stretch of the imagination properly called
semantic restrictions.
The great merit of this bnode/variable scoping
idea, it seems to me, is its simplicity, which
includes its semantic simplicity. Semantically,
bnodes - existentials - are the logically clear
idea. Variables have no actual logical semantics
at all, unless they are thought of as "labelled
existentials". So the best way to think about it
logically is that variables are being treated as
bnodes (with trackable bindings), rather then
vice versa.
Pat
>
>>This is a conservative approach and, like any extension to the level of
>>entailment, more solutions might become
>>possible when relaxed. It would allow
>>all entailments for queries consisting of a
>>single BGP. It would leave open the
>>possibility of the rdfD1 use case above as an
>>extension.
>
>This is acceptable only if bnodes are forbidden in queries.
>
>cheers
>--e.
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 17 January 2006 16:43:04 UTC