Re: major technical: blank nodes [OK?]

On Thu, 2006-01-12 at 13:38 -0800, Fred Zemke wrote:
> Blank nodes of the form _:a and [] do not add anything to the language. 
> Everything that can be expressed with such blank nodes can be expressed
> with variables.


Meanwhile, one of our design objectives is...

" 4.1 Human-friendly Syntax
There must be a text-based form of the query language which can be read
and written easily by users of the language."

and so we had considerable discussion of how much we should
appeal to SQL intuitions and idioms, RDQL idioms, and turtle/N3

We gave this issue the name punctuationSyntax

and on 2005-03-08, we resolved to "adopt the turtle+variables syntax".

We have since re-opened the decision to adjust various details of
the grammar, but the principle remains the same:

Testing RDF data in turtle works as a query
  1. Take any turtle document foo.ttl any paste the content into @HERE@ in a
  sparql query:
    SELECT * WHERE {  @HERE@ }

  2. Run it against the Turtle document foo.ttl
  3. You should get 1 match with no bindings.
 -- Subject: punctuationSyntax Date: 7 Apr 2005

>   What is the difference semantically between
> _:a and ?a ? The only difference I can see is that _:a can not be
> placed in the SELECT list (and there does not appear to be any
> motivation for this).  Thus if the user, in the course of writing a
> query, later decided he wants to receive the value of the blank node,
> he must rewrite the query with a variable in place of the blank node.
> The user might as well just write the query without blank nodes from
> the beginning. 
> In addition, the term "blank node" creates a false analogy with RDF. 
> An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
> is not a node at all, it is actually a variable that cannot be named in
> the SELECT list.

An RDF blank node is a node in a abstract syntax graph; likewise,
and SPARQL blank node is a node in an abstract syntactic structure.

>   Note that the definition of pattern solution in
> section 2.4 says that a SPARQL blank node can be mapped to an RDF
> term that is not an RDF blank node, and conversely a variable may be
> mapped to an RDF blank node.

Yes, that is how unification traditionally works, no?

>   Thus the two notions of blank
> node have nothing to do with one another aside from the notation that
> is employed.

They both play the role of existential variables. The SELECT clause
is like the "if" part of formula, in which case existential
variables act like universals.

> A possible reply is that the "SELECT *" only selects
> the variables and not the blank nodes, so the distinction has a meaning.
> However, SQL has found that the wildcard asterisk in the SELECT list
> was a bad language idea, and I do not recommend it for SPARQL.
> This is not a criticism of the blank nodes of the form [ :p "v" ],
> which correspond to the linguistically useful "that which" construction.
> Perhaps the reply is that blank nodes of the form _:a or [] exist
> to provide the translation for [ :p "v" ].  However, the rule for
> translating these says that the implementation must create a unique blank
> node, ie, different from any that the user has already placed in the
> query; it could just as well be worded to say that the implementation must
> create a unique variable name, different from any the user has chosen.
> The specification could also be written so that any variables created
> by the implementation would not be visible to SELECT *, in the unfortunate
> event that you keep that notation.
> My preference would be to eliminate SPARQL blank nodes
> from the language as unnecessary and liable to cause confusion with
> users.

I wonder if the discussion of human friendly syntax and consistency
with turtle above is an acceptable justification for declining
this suggestion?

>   If you don't accept that, my next proposal would be to come
> up with some other term for these gadgets (though I think that the use
> of similar notation for RDF blank nodes and SPARQL blank nodes will
> cause confusion even if you change the term).  My last recourse position
> would be that the entire document should
> be scanned to replace every occurrence of "blank node" with either
> "RDF blank node" or "SPARQL blank node". 

As we have had other comments asking for _more_ consistency in
terminology for blank nodes, I am disinclined to accept that suggestion;
why burden readers with two concepts when one is all that is
needed to correctly specify the design?

> As an example of the possible confusion caused by SPARQL blank nodes,
> consider the arbitrary ordering in 10.1.3 "ORDER BY",
> which sorts blank nodes second, after unassigned
> variables and before IRIs.  The term "blank node" is ambiguous,
> meaning either an RDF blank node or a SPARQL blank node.  In this context
> you must mean an RDF blank node, since a SPARQL blank node is a
> piece of syntax and not a value. 

It seems you have answered your own claim about ambiguity in that case.

If I understand you correctly, the only design change you suggest
is to remove the _:a and [] syntax, which you observe, correctly,
is only syntactic sugar. Perhaps I misunderstand you,
but I find it difficult to see that as a "major technical" comment.
Would that change actually make SPARQL meet some use case that
it does not currently use? Or would it make SPARQL very much
easier to implement?

I hope you find this response satisfactory. Please let us know whether
you do.

> This is not a criticism of blank nodes in CONSTRUCT templates, where
> there actually is connection between the SPARQL blank nodes and
> RDF blank nodes.  Blank nodes in the CONSTRUCT template should be
> retained.
> Fred Zemke
Dan Connolly, W3C
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Thursday, 12 January 2006 22:42:22 UTC