Re: major technical: blank nodes [OK?] from Dan Connolly on 2006-01-12 (public-rdf-dawg-comments@w3.org from January 2006)

From: Dan Connolly <connolly@w3.org>
Date: Thu, 12 Jan 2006 16:42:15 -0600
To: Fred Zemke <fred.zemke@oracle.com>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <1137105736.19546.399.camel@dirk.w3.org>
On Thu, 2006-01-12 at 13:38 -0800, Fred Zemke wrote:
> Blank nodes of the form _:a and [] do not add anything to the language. 
> Everything that can be expressed with such blank nodes can be expressed
> with variables.

True.

Meanwhile, one of our design objectives is...

" 4.1 Human-friendly Syntax
There must be a text-based form of the query language which can be read
and written easily by users of the language."
 -- http://www.w3.org/TR/rdf-dawg-uc/#d4.1

and so we had considerable discussion of how much we should
appeal to SQL intuitions and idioms, RDQL idioms, and turtle/N3
idioms.

We gave this issue the name punctuationSyntax
 http://www.w3.org/2001/sw/DataAccess/issues#punctuationSyntax

and on 2005-03-08, we resolved to "adopt the turtle+variables syntax".

We have since re-opened the decision to adjust various details of
the grammar, but the principle remains the same:

[[
Testing RDF data in turtle works as a query
  1. Take any turtle document foo.ttl any paste the content into @HERE@ in a
  sparql query:
    SELECT * WHERE {  @HERE@ }

  2. Run it against the Turtle document foo.ttl
 
  3. You should get 1 match with no bindings.
]]
 -- Subject: punctuationSyntax Date: 7 Apr 2005
http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0041.html



>   What is the difference semantically between
> _:a and ?a ? The only difference I can see is that _:a can not be
> placed in the SELECT list (and there does not appear to be any
> motivation for this).  Thus if the user, in the course of writing a
> query, later decided he wants to receive the value of the blank node,
> he must rewrite the query with a variable in place of the blank node.
> The user might as well just write the query without blank nodes from
> the beginning. 
> 
> In addition, the term "blank node" creates a false analogy with RDF. 
> An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
> is not a node at all, it is actually a variable that cannot be named in
> the SELECT list.

An RDF blank node is a node in a abstract syntax graph; likewise,
and SPARQL blank node is a node in an abstract syntactic structure.

>   Note that the definition of pattern solution in
> section 2.4 says that a SPARQL blank node can be mapped to an RDF
> term that is not an RDF blank node, and conversely a variable may be
> mapped to an RDF blank node.

Yes, that is how unification traditionally works, no?
  http://en.wikipedia.org/wiki/Unification

>   Thus the two notions of blank
> node have nothing to do with one another aside from the notation that
> is employed.

They both play the role of existential variables. The SELECT clause
is like the "if" part of formula, in which case existential
variables act like universals.

> A possible reply is that the "SELECT *" only selects
> the variables and not the blank nodes, so the distinction has a meaning.
> However, SQL has found that the wildcard asterisk in the SELECT list
> was a bad language idea, and I do not recommend it for SPARQL.
> 
> This is not a criticism of the blank nodes of the form [ :p "v" ],
> which correspond to the linguistically useful "that which" construction.
> Perhaps the reply is that blank nodes of the form _:a or [] exist
> to provide the translation for [ :p "v" ].  However, the rule for
> translating these says that the implementation must create a unique blank
> node, ie, different from any that the user has already placed in the
> query; it could just as well be worded to say that the implementation must
> create a unique variable name, different from any the user has chosen.
> The specification could also be written so that any variables created
> by the implementation would not be visible to SELECT *, in the unfortunate
> event that you keep that notation.
> 
> My preference would be to eliminate SPARQL blank nodes
> from the language as unnecessary and liable to cause confusion with
> users.

I wonder if the discussion of human friendly syntax and consistency
with turtle above is an acceptable justification for declining
this suggestion?


>   If you don't accept that, my next proposal would be to come
> up with some other term for these gadgets (though I think that the use
> of similar notation for RDF blank nodes and SPARQL blank nodes will
> cause confusion even if you change the term).  My last recourse position
> would be that the entire document should
> be scanned to replace every occurrence of "blank node" with either
> "RDF blank node" or "SPARQL blank node". 

As we have had other comments asking for _more_ consistency in
terminology for blank nodes, I am disinclined to accept that suggestion;
why burden readers with two concepts when one is all that is
needed to correctly specify the design?

> As an example of the possible confusion caused by SPARQL blank nodes,
> consider the arbitrary ordering in 10.1.3 "ORDER BY",
> which sorts blank nodes second, after unassigned
> variables and before IRIs.  The term "blank node" is ambiguous,
> meaning either an RDF blank node or a SPARQL blank node.  In this context
> you must mean an RDF blank node, since a SPARQL blank node is a
> piece of syntax and not a value. 

It seems you have answered your own claim about ambiguity in that case.

If I understand you correctly, the only design change you suggest
is to remove the _:a and [] syntax, which you observe, correctly,
is only syntactic sugar. Perhaps I misunderstand you,
but I find it difficult to see that as a "major technical" comment.
Would that change actually make SPARQL meet some use case that
it does not currently use? Or would it make SPARQL very much
easier to implement?

I hope you find this response satisfactory. Please let us know whether
you do.


> This is not a criticism of blank nodes in CONSTRUCT templates, where
> there actually is connection between the SPARQL blank nodes and
> RDF blank nodes.  Blank nodes in the CONSTRUCT template should be
> retained.
> 
> Fred Zemke
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Thursday, 12 January 2006 22:42:22 UTC