RDF PATCH and Downstream consequences of blank nodes [was Re: SPARQL Profile for PATCH] from David Booth on 2014-09-23 (semantic-web@w3.org from September 2014)

From: David Booth <david@dbooth.org>
Date: Tue, 23 Sep 2014 17:34:28 -0400
CC: semantic-web <semantic-web@w3.org>
Message-ID: <5421E764.70302@dbooth.org>
BTW, I want to draw attention to the fact that the need for defining an 
RDF-specific PATCH operation is *entirely* a consequence of RDF's 
allowance of unrestricted blank nodes.  I do not think that blank nodes 
should be eliminated from RDF, but I am convinced that RDF's current 
treatment of blank nodes is a significant design flaw that has *many* 
downstream effects that are ultimately detrimental to RDF's adoption. 
The need for RDF PATCH is another example.

Unix/linux diff and patch utilities have been used successfully for 
*decades*, with many other information representations.  Imagine how 
simple and easy it would be if we could just generate canonical 
N-Triples and use standard diff and patch against that!  But we can't, 
because blank nodes are unstable across RDF serializations and no 
canonical way to generate them has been standardized.  This, in turn is 
because generating a canonical form of unrestricted RDF is a hard 
problem (NP-complete), because of blank nodes.  The problem is *much* 
easier if the use of blank nodes is limited to *implicit* blank nodes -- 
those that are generated implicitly by the use of square brackets "[]" 
or parentheses "()" for lists in Turtle -- and indeed this is the vast 
majority of blank node use.  (See "Everything You Always Wanted to Know 
About Blank Nodes", by Hogan, Arenas, Mallea and Polleres:
http://www.websemanticsjournal.org/index.php/ps/article/viewFile/365/387 )

For this reason the use of "Well Behaved RDF" was proposed, which limits 
the use of blank nodes to implicit blank nodes:
http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf
I don't know if Well Behaved RDF is the best solution to this problem. 
Maybe someone will come along with a better idea.  But I am convinced 
that the current treatment of blank nodes in RDF is a serious problem 
that we should fix in order to make RDF simpler to use, understand and 
adopt.

I really don't like having to make excuses for RDF when it cannot be 
used in a similar way as nearly every other information representation 
-- such as being able to easily compare two RDF documents for "equality" 
(which in RDF becomes a complex graph isomorphism problem) or generate a 
simple diff and patch -- all because of RDF's unrestricted treatment of 
blank nodes.

Clearly this is not something that the Linked Data Platform working 
group can fix.  But I think it is important to bring it to people's 
attention, in the hope that we will someday soon have the creativity and 
gumption to fix it.

I should also acknowledge that there are some who do not feel that RDF's 
treatment of blank nodes is a problem.  Fine.  It may not be a problem 
to an elite few who are well steeped in the subtleties of description 
logic, model theory and RDF Semantics, and who don't mind having to use 
RDF-specific tools instead of generic tools.  But having tried for over 
10 years to explain RDF to a wider audience of regular software 
developers, I am convinced that subtleties like RDF's treatment of blank 
nodes *are* a problem to a much wider audience of *potential* RDF users 
who would be more inclined to adopt RDF if it didn't have complexities 
like this.  As it is they are more likely to stick with JSON or XML, 
whose complexities they already know, rather than venturing into the 
obscure and esoteric world of RDF.

RDF tools are not as mature as those for XML or even JSON, which is much 
younger than RDF.  I believe blank nodes are one specific reason they're 
not.  The fact that we still don't even have a simple, standard way to 
compare RDF documents and compute diffs and patches, is a perfect example.

David
Received on Tuesday, 23 September 2014 21:34:56 UTC