Blank Nodes Re: Toward easier RDF: a proposal from Tim Berners-Lee on 2018-11-22 (semantic-web@w3.org from November 2018)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 22 Nov 2018 12:02:20 +0000
To: David Booth <david@dbooth.org>
Cc: SW-forum Web <semantic-web@w3.org>, Dan Brickley <danbri@google.com>, "Sean B. Palmer" <sean@miscoranda.com>, Olaf Hartig <olaf.hartig@liu.se>, Axel Polleres <axel@polleres.net>
Message-Id: <8341F6EE-D8FC-4D05-A152-1DA0805A796F@w3.org>

David

I agree with your resolution to make RDF easier to use for real  developers, whatever they are.  But I do not despair at the level that you do, I am more hopeful.
Let me pick just one of your points (with a new subject as suggested).

> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org> wrote:
> 
> 3. Blank nodes.  They are an important convenience for RDF
> authors,

Yes, here I agree.  The default data language for developers at the moment
if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent to a blank node [] in turtle

Where in JSON you write

{ “name”: “Fred Bloggs”,
  “address”: {
    “number”:  123,
    “street”: “Acacia Avenue” }
}

in turtle you write

[ :name “Fred Bloggs”; 
  :address [
      :number  123;
      :street  “Acacia Avenue” ]
] 

Which is just as simple as the JSON.  When you look at Turtle as a language
to write and to generate it is I think nice.
In fact using turtle more for documentation and examples instead of Ntriples etc I think will make things easier for developers.
This is just a bit of nested structure in the language, which is valuable,
understandable and no cause for alarm.

> but they cause insidious downstream complications.
> They have subtle, confusing semantics.  

I find them very simple, thanks.

> (As Nathan Rixham
> once aptly put it, a blank node is "a name that is not
> a name".)  

No, it is not a name that is not a name, it is a thing which has no URI.
A little less hysteria over blank nodes may be in order.

> Blank nodes are special second-class citizens
> in RDF.  They cannot be used as predicates,

Agreed it messes up the symmetry.  Actually in most of my code you can use a blank node as a predicate.  That said, RDF is unusual in having as much symmetry. 
I don’t think your average JSON programmer expects to be able to use an object as a key.  So this won’t confuse them. 

> and they are not
> stable identifiers.  

They are not stable identifiers because the
people who generate the data, like the JSON above, don’t want to have to go to the pain of thinking up or supporting an identifier.

> A blank node label cannot be used in
> a follow-up SPARQL query to refer to the same node, which
> is justifiably viewed as completely broken by RDF newbies.

If the data is serialized as turtle, typically the blank nodes all
appear as [ ] square brackets, so there is no blank node identifier 
which would cause a newbie to thing they could query it.

> Blank nodes also cause duplicate triples (non-lean) when the
> same data is loaded more than once, which can easily happen
> when data is merged from different sources.  

Just a is if you were using an SQL database or an graph database, in general
when you load data, it is wise to query whether this is something we already know, and if not, don’t add it again.

In most systems, if you load the same data more than once,
you get duplications.  RDF with no blank nodes is fairly unique in that duplicate triples are automatically removed, so long as as everyone has used the same URIs for the same things. 

> And they cause difficulties with canonicalization, described next.

Canonicalization works for me with real data, thanks.
But that is another topic, not this one.

But the take-away from the your note about blank nodes: use more turtle, and think about it as the turtle language more than the underlying triples.

timbl

Received on Thursday, 22 November 2018 12:02:27 UTC