- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Wed, 8 Oct 2014 10:23:25 +0200
- To: Sandro Hawke <sandro@w3.org>
- Cc: Tim Berners-Lee <timbl@w3.org>, David Booth <david@dbooth.org>, SW-forum Web <semantic-web@w3.org>
- Message-ID: <CAFKQJ8=AdXo_sq0kscMbh834WC6ntesX5_6W-dTfvNKKj3Xhng@mail.gmail.com>
Having a unique path isn't enough. s p []. s p []. both paths are (s p). Non-lean, but still or s p [p [p a]; p [p b]] one (s,p) and two (s,p,p) Shows you need to pay attention to the p,o when the s is [], too. I think this is the idea of witness in the Hogan, Arenas, Mallea and Polleres paper. On Tue, Oct 7, 2014 at 7:38 PM, Sandro Hawke <sandro@w3.org> wrote: > On 09/26/2014 06:20 PM, Tim Berners-Lee wrote: > > > On 2014-09 -24, at 00:37, Sandro Hawke <sandro@w3.org> wrote: > > On 09/23/2014 05:34 PM, David Booth wrote: > > BTW, I want to draw attention to the fact that the need for defining an > RDF-specific PATCH operation is *entirely* a consequence of RDF's allowance > of unrestricted blank nodes. I do not think that blank nodes should be > eliminated from RDF, but I am convinced that RDF's current treatment of > blank nodes is a significant design flaw that has *many* downstream effects > that are ultimately detrimental to RDF's adoption. The need for RDF PATCH > is another example. > > Unix/linux diff and patch utilities have been used successfully for > *decades*, with many other information representations. Imagine how simple > and easy it would be if we could just generate canonical N-Triples and use > standard diff and patch against that! But we can't, because blank nodes > are unstable across RDF serializations and no canonical way to generate > them has been standardized. This, in turn is because generating a > canonical form of unrestricted RDF is a hard problem (NP-complete), because > of blank nodes. The problem is *much* easier if the use of blank nodes is > limited to *implicit* blank nodes -- those that are generated implicitly by > the use of square brackets "[]" or parentheses "()" for lists in Turtle -- > and indeed this is the vast majority of blank node use. (See "Everything > You Always Wanted to Know About Blank Nodes", by Hogan, Arenas, Mallea and > Polleres: > http://www.websemanticsjournal.org/index.php/ps/article/viewFile/365/387 > ) > > For this reason the use of "Well Behaved RDF" was proposed, which limits > the use of blank nodes to implicit blank nodes: > http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf > I don't know if Well Behaved RDF is the best solution to this problem. > Maybe someone will come along with a better idea. But I am convinced that > the current treatment of blank nodes in RDF is a serious problem that we > should fix in order to make RDF simpler to use, understand and adopt. > > I really don't like having to make excuses for RDF when it cannot be used > in a similar way as nearly every other information representation -- such > as being able to easily compare two RDF documents for "equality" (which in > RDF becomes a complex graph isomorphism problem) or generate a simple diff > and patch -- all because of RDF's unrestricted treatment of blank nodes. > > Clearly this is not something that the Linked Data Platform working group > can fix. But I think it is important to bring it to people's attention, in > the hope that we will someday soon have the creativity and gumption to fix > it. > > I should also acknowledge that there are some who do not feel that RDF's > treatment of blank nodes is a problem. Fine. It may not be a problem to > an elite few who are well steeped in the subtleties of description logic, > model theory and RDF Semantics, and who don't mind having to use > RDF-specific tools instead of generic tools. But having tried for over 10 > years to explain RDF to a wider audience of regular software developers, I > am convinced that subtleties like RDF's treatment of blank nodes *are* a > problem to a much wider audience of *potential* RDF users who would be more > inclined to adopt RDF if it didn't have complexities like this. As it is > they are more likely to stick with JSON or XML, whose complexities they > already know, rather than venturing into the obscure and esoteric world of > RDF. > > RDF tools are not as mature as those for XML or even JSON, which is much > younger than RDF. I believe blank nodes are one specific reason they're > not. The fact that we still don't even have a simple, standard way to > compare RDF documents and compute diffs and patches, is a perfect example. > > David > > > > I agree that it makes sense to have good terminology for graphs that can > be serialized in Turtle without blank node labels, and perhaps to focus > diff on these nice graphs. (When I clicked on your Well-Behaved RDF > link, my PDF viewer remembered I was on page six. :-) ) > > How would you name the blank nodes when serializing this in n-triples or > n-quads? I guess given triple S P O, where O is a blank node, you'd name > O based on the hash of the names of S and P? Something like > > o.asNTriplesTerm = "_:a"+sha1hex(s.asNTriplesTerm+" "+p.asNTriplesTerm)? > > > Or maybe leave off the NTriples baggage, the "<"...">" and "_:". Maybe: > > o.id = sha1hex(s.id+" "+p.id) > > You'd have to evaluate these in the right order, but that would be the > order you parsed them from Turtle, so it should be fast and > deterministic. Hm. Can we avoid the repeated hashing? I suppose so, > with some kind of path expression: > > Given: <s1> <p1> [ <p2> ( 1 2 3 [ <p3> ( 11 12 [ <p4> 444 ] ) ] ) ] > > The subject of <p4> 444 would have its id determined by hashing (s1, p1, > p2, 4, p3, 3). > > (I'm treating integer list positions as if they were properties.) With > a little cleverness, caching those intermediate hashes, I think that could > run almost as fast as computing a hash of the turtle file (ie very fast). > > Wow, this might work. Very good idea, David. Beyond the fact that it > only works on this kind of graph, does anyone see any problem with it? > > > Hmmm , suppose a blank node can be identified by more than one path, > then the hashed path would not be unique and so you couldn't use them to > compare blank nodes > > > My quick analysis is that every blank node will be identified by exactly > one forward path, given David Booth's proposed constraint that graphs can > be represented in Turtle without blank node labels. > > Thinking this through: > > - Blank nodes only get introduced by [ ] and ( ). > > - I think it's clear ( ) is syntactic sugar for [ ] with > rdf:first/rest/nil. ( 1 2 ) == [ rdf:first 1; rdf:rest [ rdf:first 2; > rdf:rest rdf:nil] ] etc. > > - With [ ] only allowed to appear in the object position, it's either: > <s> <p> [ ... ] > > in which case the path is (s,p) > > - Or it's inside a [ ], like: > ... [ ...; <p2> [ ... ]; ... ] ... > > in which case the path is the path to the outer [ ], with p2 appended. > > So, that makes me think there's exactly one path to each blank node in > this constrained situation. > > -- Sandro > > > -- Sandro > > > >
Received on Wednesday, 8 October 2014 08:24:24 UTC