Re: Coments on first working draft of SPARQL from Seaborne, Andy on 2004-11-07 (public-rdf-dawg-comments@w3.org from November 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Sun, 07 Nov 2004 17:16:51 +0000
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <418E5883.4010306@hp.com>
Peter F. Patel-Schneider wrote:
> From: "Seaborne, Andy" <andy.seaborne@hp.com>
> Subject: RE: Coments on first working draft of SPARQL
> Date: Mon, 25 Oct 2004 13:44:59 +0100
> 
> 
>>Peter,
>>
>>Thank you very much for the comments:
>>
>>Changes where mentioned are started in v1.121.  As a wokring draft
>>document, there will be quite a few changes to come.  Some of the
>>matters arising can't be completely finished until other documents are
>>ready.
>>
>>	Andy
>>
>>-------- Original Message --------
>>
>>>From: Peter F. Patel-Schneider <>
>>>Date: 13 October 2004 17:09
>>>
>>>I took a quick look at
>>>
>>>    SPARQL RDF query language
>>>    http://www.w3.org/TR/rdf-sparql-query/
>>>
>>>For a first working draft it is quite good.
> 
> 
> [Agreements elided.]
> 
> 
>>>Nevertheless, I have a number of things that I think need bringing up.
>>>Now for some more substantive issues:
>>>
>>>
>>>SPARQL allows bnodes in triple patterns and in constraints.  This
>>
>>leads
>>
>>>to a number of thorny issues.
>>>
>>>How are blank nodes handled in constraints?  For example, what does
>>>	_:a < 30  (where _:a is a blank node)
>>>evaluate to?
>>
>>The exact evaluation will depend on the constraint function but in this
>>example it would evaluate to an error and hence lead to the rejection of
>>potental solutions where a bNode is compared.
> 
> 
> I don't see any language in the working draft supporting this.  Remember, a
> blank node is not a variable.

There is very little in the working draft on this - it is an area being 
worked on and there is more in the editors draft but is still unfinished.

> 
> 
>>The working draft had very little in this area and the editors version
>>has added some material, especially the use of (a subset of) the
>>Xquery/Xpath functions and operators.
>>
>>I have also 
> 
> 	       ^^^???

Typo - sorry.

> 
> 
>>>How are blank nodes handled in triple patterns?  For example, does the
>>>triple pattern
>>>	( ?x ex:r _:v )
>>>match the RDF graph
>>>	ex:a ex:r _:a .
>>>	ex:a ex:r _:b .
>>
>>Your comments suggest that a section devoted to the details around
>>bNodes would be helpful.  This has been started in the editors working
>>draft.
>>
>>The query syntax does not allow bNodes in queries. bNodes can not be put
>>in query requests and that needs to be explained somewhere.
> 
> 
> The working draft has explicit wording to the contrary.

This definition is not the syntax for the language. The definitions at 
this point of the document set up terminiolgy that works on patterns in 
queries.  These graph patterns can be combined to produce other patterns 
so allowing bnodes helps this if this is thought of as subqueries.  More 
below.

If you have suggestions for improving the approach taken in the document, 
please let me know.

> 
>   <p class="defn"><b>Definition:</b> RDF Term<br />
>   <br />
>   An <span class="definedTerm">RDF Term</span> is anything that can
>   occur in the <a href=
>   "http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-data-model">
>   RDF data model</a>.<br />
>   let RDF-U be the set of all <a href=
>   "http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference">
>   RDF URI References</a><br />
>   let RDF-L be the set of all <a href=
>   "http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-literal">
>   RDF Literals</a><br />
>   let RDF-B be the set of all <a href=
>   "http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-blank-node">
>   bNodes</a><br />
>   <br />
>   The set of RDF Terms, RDF-T, is RDF-U union RDF-L union RDF-B</p>
> 
> 
>   <p class="defn"><b>Definition:</b> <a id="defn_TriplePattern"
>   name="defn_TriplePattern">Triple Pattern</a><br />
>   <br />
>   The set of <span class="definedTerm">triple patterns</span>
>   is<br />
>   &nbsp;&nbsp;&nbsp; (RDF-U union RDF-B union V) x (RDF-U union V)
>   x (RDF-T union V)</p>
> 
> So both subjects and objects in Triple Patterns can be blank nodes.
> 
> The SPARQL grammar appears to agree with these definition.  Of course, that
> grammar is not very well written, as it makes literals include URIs.

'Literal' in the grammar isn't an RDF literal - it's a constant term.  We 
will change the wording.  Constants are URIs, RDF plain literals, typed 
RDF literals and the convenience forms for xsd:integers and xsd:doubles.

[39]  	Literal  	 ::=  	URI | NumericLiteral | TextLiteral

(TextLiterals include typed RDF literals - that could be better named)


The production for a TriplePattern is:

[16]  	TriplePattern  	 ::=  	'(' VarOrURI VarOrURI VarOrLiteral  ')'
[17]  	VarOrURI  	 ::=  	<VAR> | URI

so RDF literals are not allowed as subjects.  In the syntax of the 
language, bNodes can't appear.

> 
> 
>>>In general, what is the status of blank nodes in SPARQL?  For example,
>>>which definition of subgraph does SPARQL use - the standard one from
>>>graph 
>>>theory or the expansive one used in RDF semantics in the presence of
>>>bnode 
>>>relabelling?
>>>
>>>Even if bnodes do not appear in a query, how are multiple matches that
>>>differ only with respect to bnodes handled?
>>>
>>>
>>>Theses issues are all a consequence of the following issue:
>>>
>>>SPARQL appears to depend on an unsanctioned extension of RDF, namely
>>>that 
>>>bnodes in an RDF graph have identity that can be taken out of the graph
>>>and 
>>>transmitted elsewhere.  Is this the case?  If so, how is this extension
>>>going to work?  If not, how can bnodes be handled reasonably in SPARQL?
>>
>>In the case where results are serialized, XML or RDF/XML forms, there
>>are merely labels (c.f. like bNodes ids in RDF/XML) that are document
>>scoped. 
> 
> 
> Then this should be stated early and often.  As well, the examples should
> use different lexical forms for the bnodes.
> 
> 
>>The working group is currently actively desiging the result
>>serialization formats.  They only enable one bNode in a serialized
>>result form to be distinguished from another in the same serialized
>>result.  
> 
> 
>>They can not be used to get back to the original bNode in the
>>graph - that would have to be done by reusing a graph pattern that found
>>it.
> 
> 
>>In the local case (no serialization of results, directly working with
>>the graph), the query processor can be working directly with the graph
>>and can return programming language objects that the graph
>>implementation uses for bNodes.  bNodes do not leave the graph; the
>>programming system has whatever mechanisms it uses to pass references
>>around just like literals and URIs in RDF APIs.
> 
> 
> Huh?  How does this work?  Is this really going to part of the SPARQL spec?
> If so, it exposes a part of RDF that I had safely thought was hidden.  

Which part of RDF did you think was hidden?  Many RDF toolkits do allow 
access to bNodes - for example, the ability to add properties when 
creating an RDF graph.

It's not a matter for DAWG to define how RDF APIs work.  When used 
remotely, SPARQL queries are serialized and results come back in encoded 
form and there is no mechanism for maintaining bNodes across the network - 
just a way to give a document scoped id so that within the document, 
bNodes can be differentiated.

If used locally, how the implementation returns results from a query is an 
implementation decision and is not going to be defined by DAWG.  Some 
systems will return whatever graph object the query happens to find - then 
this object can be used for further (non-query) API operations such as 
adding properties.

Example of local use might be:

results = queryExecute(
         "SELECT ?person WHERE ( ?person foaf:mbox <mailto:joe> )") ;
for ( solution in results)
{
    x = solution.get("person") ;
    x.addProperty(foaf.name,"Joe") ;
}

the return type of solution.get will be whatever the RDF toolkit chooses 
to do about implementing the graph.

It also means that query structures could be created that do involve 
bNodes - this can't be done in the syntax of the language but if the 
abstract syntax tree is constructed programmatically, then local object 
might be included - toolkit implementation decision and not to do with 
DAWG.  Making TriplePatterns more general (including bNodes) than the 
syntax allows, is just a way of recognizing this direct use of query on a 
local RDF graph.


I suspect we have different underlying views on how RDF applications are 
going to be constructed.  My hope is that SPAQRL is neutral to that - if 
you see some approaches being made impossible, or difficult, then please 
let us know.

	Andy

> 
> 
>>	Andy
> 
> 
> Peter F. Patel-Schneider
> Bell Labs Research
>
Received on Sunday, 7 November 2004 17:17:29 UTC