Draft response to: Re: major technical: blank nodes from Pat Hayes on 2006-01-26 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 26 Jan 2006 16:50:59 -0600
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230916bffeefad07aa@[10.100.0.23]>
<<After volunteering for this I noticed that Dan 
had already responded to this message with an 
[OK?], so this might now be redundant. But here 
goes anyway.>>

Fred, greetings.

You make several points about blank nodes in 
SPARQL queries, and we will respond to them in 
sequence. Your first point:

>Blank nodes of the form _:a and [ ] do not add anything to the language.
>Everything that can be expressed with such blank nodes can be expressed
>with variables.

is correct. The language has a syntactic 
redundancy. Some members of the working group 
agree with your conclusion. We considered 
prohibiting blank nodes in queries, but this 
would impose an extra syntactic burden on someone 
wishing to form query patterns by editing query 
variables into RDF. We also considered not having 
unselected variables and requiring what are now 
unselected variables to be replaced by blank 
nodes, but again this imposes a burden on users 
while providing no extra utility. In neither case 
did the conceptual simplification seem worth the 
operational burden on users.

There is however a deeper reason for 
distinguishing query blank nodes from query 
variables, which addresses your next point:

>What is the difference semantically between
>_:a and ?a ? 

Extending SPARQL to richer entailment modes can 
make them semantically different. When simple 
entailment is replaced by OWL entailment in the 
SPARQL basic definitions, it is possible for an 
existential to be OWL-entailed by a graph which 
contains no token which would be a binder for a 
query variable: OWL supports 'genuinely 
existential' entailments. For one of many 
possible examples, if the OWL asserts that :a is 
in a restriction class of :p to :c with 
cardinality one, this entails the assertion

:a :p _:x .
_:x rdf:type :c

but provides no term to bind the query variable ?x to in the query pattern

:a :p ?x .
?x rdf:type :c

so the query

SELECT ?y WHERE { ?y :p _:u , _:u rdf:type :c }

would succeed with x bound to :a, but the corresponding query

SELECT ?y WHERE { ?y :p ?u , ?u rdf:type :c }

might rationally be said to fail; all when using 
OWL entailment.  Admittedly, this case is 
controversial. One could argue that even in the 
second case, it would be sensible to require that 
the query engine provide a blank node identifier 
as an answer binding. But the working group felt 
that it would be prudent to leave the option open 
for future designers of OWL versions of SPARQL, 
which motivates keeping the blank-node/variable 
distinction in the syntax.

Your next point is best addressed by discussing blank node scopes.

>  The only difference I can see is that _:a can not be
>placed in the SELECT list (and there does not appear to be any
>motivation for this).  Thus if the user, in the course of writing a
>query, later decided he wants to receive the value of the blank node,
>he must rewrite the query with a variable in place of the blank node.
>The user might as well just write the query without blank nodes from
>the beginning.

There really is no such thing in SPARQL as the 
'value' of a query blank node. Blank node 
identifiers in queries are scoped to the query, 
and indicate an existential assertion.

In the course of checking the simple entailment 
relationship between the target graph and the 
pattern instance such a blank node must be 
'mapped' to some term in the target graph, to be 
sure, but this mapping is distinct from the 
variable-to-binding instance mapping: it does not 
identify that term in any sense; rather, the 
presence of the mapped term simply confirms the 
truth of the existential claim made by the 
presence of the blank node. This also gets to 
your next point:

>In addition, the term "blank node" creates a false analogy with RDF.
>An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
>is not a node at all, it is actually a variable that cannot be named in
>the SELECT list. 

We disagree. It is exactly an RDF blank node, and 
the analogy is not false. Do not think of a query 
bnode as a 'blank variable': think instead of the 
entire query basic graph pattern as an RDF graph 
with some 'named holes' in it, the query 
variables. The query answer is a vector of pieces 
of RDF syntax which, when syntactically 
substituted for the variables, produces (an 
appropriate lexicalization of ) an RDF graph 
which is simply entailed by the target graph[*]. 
All of this is purely syntactic, but the 
entailment relationship between this instance and 
the target graph, that makes the answer a genuine 
answer, is semantic. Blank nodes in the query 
pattern are genuine RDF blank nodes in the 
entailed instance, and the entailment 
relationship holds between two RDF graphs.

Simple entailment is indeed so simple that it can 
be defined in terms of a mapping from blank nodes 
to RDF terms: A simply entails B just when B has 
an RDF instance (gotten by mapping from blank 
nodes to terms) which is a subgraph of A. So, to 
check the required relationship between a target 
graph A and a basic graph pattern C, we need an 
instance mapping M on the variables in C and then 
another N on the blank nodes in M(C) such that 
N(M(C)) is a subgraph of A. In this simple case, 
then, this is equivalent to asking for a single 
mapping on variables and blank nodes which 
produces an instance [N+M](C) which is a subgraph 
of A, then ignoring part of it.  But there is a 
real conceptual distinction, which is reflected 
in the definitions, between the two parts of this 
composite mapping; and when simple entailment is 
replaced by more advanced forms of entailment, 
the distinction can become operationally 
important.

Pat

[*] (In fact, it is simply entailed by a 'scoping 
graph' which is graph-equivalent to the target 
graph under a blank node substitution, but this 
complication is just to allow blank nodes to be 
scoped separately in the answer document.)

Pat
-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 26 January 2006 22:51:07 UTC