Re: Draft response to: Re: major technical: blank nodes from Pat Hayes on 2006-01-27 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 27 Jan 2006 12:39:08 -0600
To: Souripriya Das <souripriya.das@oracle.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>, "Seaborne, Andy" <andy.seaborne@hp.com>
Message-Id: <p0623091cc0001561dea3@[10.100.0.23]>
>Pat,
>
>This is an excellent response

Its only a draft at present :-)

>. I would like to point two things however.
>
>Pat Hayes wrote:
>>>What is the difference semantically between
>>>_:a and ?a ?
>>
>>
>>Extending SPARQL to richer entailment modes can make them 
>>semantically different. When simple entailment is replaced by OWL 
>>entailment in the SPARQL basic definitions, it is possible for an 
>>existential to be OWL-entailed by a graph which contains no token 
>>which would be a binder for a query variable: OWL supports 
>>'genuinely existential' entailments. For one of many possible 
>>examples, if the OWL asserts that :a is in a restriction class of 
>>:p to :c with cardinality one, this entails the assertion
>>
>>:a :p _:x .
>>_:x rdf:type :c
>>
>>but provides no term to bind the query variable ?x to in the query pattern
>>
>>:a :p ?x .
>>?x rdf:type :c
>>
>>so the query
>>
>>SELECT ?y WHERE { ?y :p _:u , _:u rdf:type :c }
>>
>>would succeed with x bound to :a, but the corresponding query
>>
>>SELECT ?y WHERE { ?y :p ?u , ?u rdf:type :c }
>>
>>might rationally be said to fail; all when using OWL entailment. 
>>Admittedly, this case is controversial. One could argue that even 
>>in the second case, it would be sensible to require that the query 
>>engine provide a blank node identifier as an answer binding. But 
>>the working group felt that it would be prudent to leave the option 
>>open for future designers of OWL versions of SPARQL, which 
>>motivates keeping the blank-node/variable distinction in the syntax.
>>
>One could argue as follows:  The entailed OWL graph (as shown above) 
>does include two triples that contain a blank-node (represented via 
>some label, shown as _:x here). So, for the second query above, why 
>shouldn't one generate a solution that substitutes the query 
>variable ?u to a blank-node (represented via some label, say :_x1)?

Well, indeed. I tend to agree with this - in fact, I believe that we 
should adopt as a general principle that if ASK succeeds with a blank 
node, then the corresponding SELECT ?x with the same pattern but with 
a variable should also succeed, possibly binding ?x to a blank node 
ID .  But FUB, who have the local expertise for OWL-DL querying, 
disagree: and certainly, it would be rather daunting to require OWL 
answering engines to create an inferred graph with ALL the possible 
existentials in it. I think its more a matter of preferred style than 
anything else: if one thinks of query variables as acting similarly 
to SQL, then it's natural to think of it binding to an ID actually in 
a dataset.

>Are we 'failing' the second query to limit the values for the 
>variables in the solution to the scoping set of original (i.e., 
>non-entailed) graph?

We would be if we did, but that is why the fully general definition 
doesn't have that restriction in it. It only restricts to a 'scoping 
set B' which isn't further specified in general, only for basic 
SPARQL. This allows B to have some extra stock of bnodeIDs when 
required for things like the OWL case.

>>Your next point is best addressed by discussing blank node scopes.
>>
>>>  The only difference I can see is that _:a can not be
>>>placed in the SELECT list (and there does not appear to be any
>>>motivation for this).  Thus if the user, in the course of writing a
>>>query, later decided he wants to receive the value of the blank node,
>>>he must rewrite the query with a variable in place of the blank node.
>>>The user might as well just write the query without blank nodes from
>>>the beginning.
>>
>>
>>There really is no such thing in SPARQL as the 'value' of a query 
>>blank node. Blank node identifiers in queries are scoped to the 
>>query, and indicate an existential assertion.
>>
>>In the course of checking the simple entailment relationship 
>>between the target graph and the pattern instance such a blank node 
>>must be 'mapped' to some term in the target graph, to be sure, but 
>>this mapping is distinct from the variable-to-binding instance 
>>mapping: it does not identify that term in any sense; rather, the 
>>presence of the mapped term simply confirms the truth of the 
>>existential claim made by the presence of the blank node. This also 
>>gets to your next point:
>>
>>>In addition, the term "blank node" creates a false analogy with RDF.
>>>An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
>>>is not a node at all, it is actually a variable that cannot be named in
>>>the SELECT list.
>>
>>
>>We disagree. It is exactly an RDF blank node, and the analogy is 
>>not false. Do not think of a query bnode as a 'blank variable': 
>>think instead of the entire query basic graph pattern as an RDF 
>>graph with some 'named holes' in it, the query variables. The query 
>>answer is a vector of pieces of RDF syntax which, when 
>>syntactically substituted for the variables, produces (an 
>>appropriate lexicalization of ) an RDF graph which is simply 
>>entailed by the target graph[*].
>
>What if the pattern contains a blank-node in the predicate position? 
>Then the entailed instance is not a valid RDF graph according to 
>current restrictions in RDF which says predicates cannot be 
>blank-nodes. If we are allowing this in SPARQL, maybe we should 
>state this explicitly.

Yes, I think we should. Such a query cannot succeed at present, the 
freedom is there only to allow for future RDF loosenings, like 
allowing a literal in the subject position.

This was only added recently (at my suggestion), and I see now that 
it could be confusing; and unlike the literal-subject case, there 
isn't any prior W3C discussion we can refer to. Hmm, maybe we should 
quietly remove that bit of extra syntactic freedom, after all.

Pat
-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 27 January 2006 18:39:22 UTC