Re: Draft response to: Re: major technical: blank nodes from Souripriya Das on 2006-01-27 (public-rdf-dawg@w3.org from January to March 2006)

From: Souripriya Das <souripriya.das@oracle.com>
Date: Thu, 26 Jan 2006 19:55:29 -0500
To: Pat Hayes <phayes@ihmc.us>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43D96F81.6010306@oracle.com>
Pat,

This is an excellent response. I would like to point two things however.

Pat Hayes wrote:

>
> <<After volunteering for this I noticed that Dan had already responded 
> to this message with an [OK?], so this might now be redundant. But 
> here goes anyway.>>
>
> Fred, greetings.
>
> You make several points about blank nodes in SPARQL queries, and we 
> will respond to them in sequence. Your first point:
>
>> Blank nodes of the form _:a and [ ] do not add anything to the language.
>> Everything that can be expressed with such blank nodes can be expressed
>> with variables.
>
>
> is correct. The language has a syntactic redundancy. Some members of 
> the working group agree with your conclusion. We considered 
> prohibiting blank nodes in queries, but this would impose an extra 
> syntactic burden on someone wishing to form query patterns by editing 
> query variables into RDF. We also considered not having unselected 
> variables and requiring what are now unselected variables to be 
> replaced by blank nodes, but again this imposes a burden on users 
> while providing no extra utility. In neither case did the conceptual 
> simplification seem worth the operational burden on users.
>
> There is however a deeper reason for distinguishing query blank nodes 
> from query variables, which addresses your next point:
>
>> What is the difference semantically between
>> _:a and ?a ? 
>
>
> Extending SPARQL to richer entailment modes can make them semantically 
> different. When simple entailment is replaced by OWL entailment in the 
> SPARQL basic definitions, it is possible for an existential to be 
> OWL-entailed by a graph which contains no token which would be a 
> binder for a query variable: OWL supports 'genuinely existential' 
> entailments. For one of many possible examples, if the OWL asserts 
> that :a is in a restriction class of :p to :c with cardinality one, 
> this entails the assertion
>
> :a :p _:x .
> _:x rdf:type :c
>
> but provides no term to bind the query variable ?x to in the query 
> pattern
>
> :a :p ?x .
> ?x rdf:type :c
>
> so the query
>
> SELECT ?y WHERE { ?y :p _:u , _:u rdf:type :c }
>
> would succeed with x bound to :a, but the corresponding query
>
> SELECT ?y WHERE { ?y :p ?u , ?u rdf:type :c }
>
> might rationally be said to fail; all when using OWL entailment.  
> Admittedly, this case is controversial. One could argue that even in 
> the second case, it would be sensible to require that the query engine 
> provide a blank node identifier as an answer binding. But the working 
> group felt that it would be prudent to leave the option open for 
> future designers of OWL versions of SPARQL, which motivates keeping 
> the blank-node/variable distinction in the syntax.
>
One could argue as follows:  The entailed OWL graph (as shown above) 
does include two triples that contain a blank-node (represented via some 
label, shown as _:x here). So, for the second query above, why shouldn't 
one generate a solution that substitutes the query variable ?u to a 
blank-node (represented via some label, say :_x1)?

Are we 'failing' the second query to limit the values for the variables 
in the solution to the scoping set of original (i.e., non-entailed) graph?

> Your next point is best addressed by discussing blank node scopes.
>
>>  The only difference I can see is that _:a can not be
>> placed in the SELECT list (and there does not appear to be any
>> motivation for this).  Thus if the user, in the course of writing a
>> query, later decided he wants to receive the value of the blank node,
>> he must rewrite the query with a variable in place of the blank node.
>> The user might as well just write the query without blank nodes from
>> the beginning.
>
>
> There really is no such thing in SPARQL as the 'value' of a query 
> blank node. Blank node identifiers in queries are scoped to the query, 
> and indicate an existential assertion.
>
> In the course of checking the simple entailment relationship between 
> the target graph and the pattern instance such a blank node must be 
> 'mapped' to some term in the target graph, to be sure, but this 
> mapping is distinct from the variable-to-binding instance mapping: it 
> does not identify that term in any sense; rather, the presence of the 
> mapped term simply confirms the truth of the existential claim made by 
> the presence of the blank node. This also gets to your next point:
>
>> In addition, the term "blank node" creates a false analogy with RDF.
>> An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
>> is not a node at all, it is actually a variable that cannot be named in
>> the SELECT list. 
>
>
> We disagree. It is exactly an RDF blank node, and the analogy is not 
> false. Do not think of a query bnode as a 'blank variable': think 
> instead of the entire query basic graph pattern as an RDF graph with 
> some 'named holes' in it, the query variables. The query answer is a 
> vector of pieces of RDF syntax which, when syntactically substituted 
> for the variables, produces (an appropriate lexicalization of ) an RDF 
> graph which is simply entailed by the target graph[*]. 

What if the pattern contains a blank-node in the predicate position? 
Then the entailed instance is not a valid RDF graph according to current 
restrictions in RDF which says predicates cannot be blank-nodes. If we 
are allowing this in SPARQL, maybe we should state this explicitly.

> All of this is purely syntactic, but the entailment relationship 
> between this instance and the target graph, that makes the answer a 
> genuine answer, is semantic. Blank nodes in the query pattern are 
> genuine RDF blank nodes in the entailed instance, and the entailment 
> relationship holds between two RDF graphs.
>
> Simple entailment is indeed so simple that it can be defined in terms 
> of a mapping from blank nodes to RDF terms: A simply entails B just 
> when B has an RDF instance (gotten by mapping from blank nodes to 
> terms) which is a subgraph of A. So, to check the required 
> relationship between a target graph A and a basic graph pattern C, we 
> need an instance mapping M on the variables in C and then another N on 
> the blank nodes in M(C) such that N(M(C)) is a subgraph of A. In this 
> simple case, then, this is equivalent to asking for a single mapping 
> on variables and blank nodes which produces an instance [N+M](C) which 
> is a subgraph of A, then ignoring part of it.  But there is a real 
> conceptual distinction, which is reflected in the definitions, between 
> the two parts of this composite mapping; and when simple entailment is 
> replaced by more advanced forms of entailment, the distinction can 
> become operationally important.
>
> Pat
>
> [*] (In fact, it is simply entailed by a 'scoping graph' which is 
> graph-equivalent to the target graph under a blank node substitution, 
> but this complication is just to allow blank nodes to be scoped 
> separately in the answer document.)
>
> Pat
Received on Friday, 27 January 2006 00:57:07 UTC