W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2006

Re: SPARQL semantics: open issues for basic query patterns

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 3 Jan 2006 11:32:34 -0600
Message-Id: <p06230901bfe055c4ab27@[10.100.0.9]>
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Cc: Dan Connolly <connolly@w3.org>, Enrico Franconi <franconi@inf.unibz.it>, Sergio Tessaris <tessaris@inf.unibz.it>

Guys, a thought occurs to me regarding the 
'basic' definitions and how to simplify them.

Now we have got rid of told bnodes, the only 
reason for introducing any complexity into the 
'naive definition' of answer binding B, viz

G (simply) entails B(Q)

is to get rid of the inevitable redundancy that 
this produces when it is read as an 'iff' 
definition of answer binding, arising from the 
fact that if B is an answer then any C which is 
like B but inserts a different bnode instead of 
some binding in B, is also an answer, by this 
naive criterion. So, if ?x/foo is an answer then 
we get infinitely many answers of the form 
?x/_:1, ?x/_:2, etc.. So, we put in this 
condition that only bnodes from G may occur, 
which itself kind of smells bad because it 
requires answer-binding bnodes to be meaningful 
outside their scope: but then we still get a 
large finite amount of redundancy, as noted by 
Sergio 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0001.html, 
so we have to tweak the definition with the (G 
entails (G union...)) trick, and then we get all 
embroiled in exactly how to specify the union, 
which is subtle and delicate, and so on. Sigh.

All of this definitional twisting arises because 
we feel a need to somehow get rid of silly 
redundancies in the answers. So, here's an 
alternative proposal. We don't get rid of them by 
tweaking the definition: we leave the definition 
naive, think of it as a necessary condition on 
answer bindings, and we add a remark about answer 
sets, that servers are not obliged to deliver 
'redundant' answers. (Not that they are required 
to be non-redundant, but that they are free to 
omit redundancy.) And we say explicitly that if A 
and B are two answers such that A(Q) is an 
instance of B(Q) (in the traditional RDF sense of 
instance, ie mapping bnodes to terms) then B is 
redundant. So if there are answers ?x/foo and 
?x/_:75, then the second is redundant: and if 
there are answers ?x/_:22 and ?x/_:47, then one 
of them is redundant provided that the other is 
given as an answer.

This completely cuts through the definitional 
maze, since we can keep the direct, naive, 
unqualified definition of 'answer binding', and 
smile happily at the observation that there are 
infinitely many answers: yes, there are, but all 
but finitely many of them are redundant. It also 
means that we can say with a straight face that 
bnodes in answers are completely scoped to the 
answer, and have no connection to bnodeIDs in the 
graph. It also means that redundancy of answers 
can be locally checked just by looking at the 
answer bindings themselves, not by checking the 
entire graph topology (as with the "G entails (G 
union...)" trick.) It also, BTW, allows future 
extensions of SPARQL which use non-simple 
entailment to also use stronger notions of 
'redundant', e.g. to decide that, say, RDFS 
tautologies are also redundant, and to omit those 
bindings from answer sets. Or not: but the point 
is that the flexibility is there, without needing 
to mess around with issues like whether or not 
this is 'real' RDFS entailment. We have a way to 
separate the bounds on answers into lower bounds 
­ which are always some kind of entailment ­ and 
upper bounds, which are some kind of 
redundancy-avoidance criterion. Within those 
bounds, servers must deliver all the answers.

Now, this does ALLOW a dumb server to be highly 
redundant in the answers it gives, like in 
Sergio's

_:a :p _:b
_:b :url <http://example.org>

example in 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0001.html. 
But OK, so it does: redundant answers are not 
actually wrong, after all, only redundant: and we 
have already considered the idea of requiring 
non-redundancy, and rejected it as too high a 
standard. I suggest that it might be good to 
allow a very simple-minded conforming SPARQL 
server to be messily redundant, if such a thing 
could be implemented very easily. Normal 
commercial pressures will ensure that smarter 
SPARQL engines will get written, but we don't 
need to mandate that dumb ones should not be 
conformant.

If y'all like this idea, I can do a run through 
http://www.w3.org/2001/sw/DataAccess/rq23/ to 
tweak the wordings to fit it. It should, if 
anything, get simpler.

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 4 January 2006 19:08:50 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:25 GMT