Re: Blank Nodes Re: Toward easier RDF: a proposal from Hugh Glaser on 2018-11-23 (semantic-web@w3.org from November 2018)

From: Hugh Glaser <hugh@glasers.org>
Date: Fri, 23 Nov 2018 12:59:04 +0000
To: Wouter Beek <wouter@triply.cc>
Cc: tpassin@tompassin.net, SW-forum Web <semantic-web@w3.org>
Message-Id: <FCFD4B5D-66F5-4B25-B3BC-90E71780280A@glasers.org>
I think the mention of Assembly Language has helped me see the issue of Blank Nodes much more clearly.

My difficulty is that I have never, ever, felt the need to specify a Blank Node, despite have created shedloads of RDF from every conceivable type of source.
How could that be?
Well, I never write RDF.
I always use tools, configure tools, or write code to create the RDF.
What I see here is that people want Blank Nodes because it makes it easier to write RDF by hand - and is nothing to do with the production or consumption tools.

So to take the Assembly Language analogy further:
We have:
Machine Code - N-Triples;
Assembly Language - Turtle/N3/RDF-XML
High Level Language - Err.... EasyRDF, Sponger config...?

An important aspect of all this (in this context) is what is the Programmer's Model?
They may well think that their Pascal array is in contiguous memory starting at some fixed address.
And that is a good way to program in Pascal.
But we know better - there may be all sorts of stuff in between, and there may even be hash tables etc going on.
And that is fine, and as it should be.
Perhaps most pertinently to the Blank Node discussion is that the Compiler will make the code relocatable.
I sort of see the requirement for Blank Nodes as close to a programmer saying that they want to be able to write *non-relocatable* code!, or maybe do register allocation management.
Many assemblers and all compilers will deal with all that sort of stuff allowing symbols etc. for data.

So my tools are akin to Assemblers or Compilers for primitive languages.
And in writing my tools (equivalent of assembling or compiling), I find it is actually easier to generate URIs than try to manage Blank Nodes.

So, if we achieve an easier RDF stack, then Blank Nodes become a complete non-issue, just like registers are in High-Level languages.

By the way, I often start by telling newbies that a great thing about RDF, and that distinguishes it from most DBs is that you get an ID for *everything*.
So I have to completely ignore Blank Nodes to maintain the fiction.


> On 23 Nov 2018, at 08:22, Wouter Beek <wouter@triply.cc> wrote:
> 
> Hi,
> 
> Blank nodes add significant complexity to the Semantic Web ecosystem.
> Some concrete examples:
> 
>  - Whenever two or more RDF sources are combined, blank nodes must be
> standardized apart.  Since Linked Data is all about combining data
> from different sources, this means that most Linked Data operations
> become more complex when blank nodes are taken into account.  It does
> not help that standardizing apart is computationally expensive.
> 
>  - Blank nodes in SPARQL result sets have document scope, and triple
> stores enforce result set limits (e.g., 10K rows), which means that
> longer result sets cannot be guaranteed to be correct.  E.g., it is
> unclear whether or not the subject term in row 10,000 is identical to
> the subject term in row 10,001:
> 
>    10,000 _:x a foaf:Person.
>    --------------------
>    10,001 _:x foaf:name "John"
> 
>  - Blank nodes make it more difficult to determine whether two RDF
> documents are the same, which complicates versioning and caching.
> 
>  - Blank nodes make Semantic Web standards -- and implementations
> thereof -- more complex.  Parts of standards that are about blank
> nodes are often the hardest parts to understand.  E.g., in RDF 1.1
> Semantics the distinction between graph merge and graph union, or the
> operation of graph leaning would not exist if there would be no blank
> nodes.  In SPARQL 1.1 the RDF instance mapping would not be needed.
> Etc.
> 
> Would it not be possible to keep the benefits of abbreviated N3
> notation while at the same time doing away with blank nodes?  E.g., by
> automatically introducing well-known IRIs instead.
> 
> ---
> Best regards,
> Wouter Beek.
> 
> Email: w.g.j.beek@vu.nl
> WWW: https://wouterbeek.org
> Tel: +31647674624
> 
> On Fri, Nov 23, 2018 at 4:22 AM Thomas Passin <tpassin@tompassin.net> wrote:
>> 
>> On 11/22/2018 6:49 PM, David Booth wrote:
>>> Uh . . . I don't think that is quite correct.  As I understand, a blank
>>> node does *not* represent *a* thing.  Rather, it asserts that there
>>> *exists* a thing, as explained in the RDF Semantics:
>>> https://www.w3.org/TR/rdf11-mt/#blank-nodes
>>> In contrast, an IRI represents *a* thing.  I'm sorry to be pedantic
>>> here, but I mention it because it underscores my point: the semantics of
>>> blank nodes really *are* subtle -- at least to *average* developers.
>> 
>> Again, blank nodes are exactly analogous to a table with no primary key.
>>  You can identify the thing by the union of its properties ... until
>> there is another thing with the same set of properties.  Then you would
>> need to have another property to distinguish the two, which property you
>> might or might not know.  You can't have a foreign key, but you can
>> still have a WHERE statement that specifies all the properties that
>> could distinguish the data object.
>> 
>> And just as with relational databases, the no-primary-key model can only
>> get you so far.  But it can be an easy way to get a data set going...
>> 
>> 
> 

-- 
Hugh
023 8061 5652
Received on Friday, 23 November 2018 12:59:50 UTC