- From: David Booth <david@dbooth.org>
- Date: Thu, 14 May 2015 15:53:12 -0400
- To: public-json-ld@w3.org
- CC: Gregg Kellogg <gregg@greggkellogg.net>, Aidan Hogan <ahogan@dcc.uchile.cl>
FYI. I did not realize that work was active on JSON-LD canonicalization, but I see the draft document was updated April 8 2015. I'm glad to see it is! It would be nice if the JSON-LD canonicalization algorithm were aligned with an N-Triples / N-Quads canonicalization algorithm, and if both could ensure blank node stability in the face of small graph changes. would this be feasible? David Booth -------- Forwarded Message -------- Subject: Re: RDF canonicalization [was Re: deterministic naming of blank nodes] Date: Thu, 14 May 2015 15:44:04 -0400 From: David Booth <david@dbooth.org> To: Gregg Kellogg <gregg@greggkellogg.net> CC: Arto Bendiken <arto@bendiken.net>, W3C Semantic Web IG <semantic-web@w3.org> On 05/14/2015 02:50 PM, Gregg Kellogg wrote: >> On May 14, 2015, at 10:27 AM, David Booth <david@dbooth.org> >> wrote: >> >> On 05/14/2015 01:03 PM, Arto Bendiken wrote: >>> On Thu, May 14, 2015 at 6:20 AM, David Booth <david@dbooth.org> >>> wrote: >>>> On 05/13/2015 02:14 AM, ahogan@dcc.uchile.cl wrote: >>>>> >>>>> Aidan Hogan. "Skolemising Blank Nodes while Preserving >>>>> Isomorphism". In WWW, Florence, Italy, May 18–22, 2015 (to >>>>> appear). Available from: >>>>> http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf >>>>> >>>>> . . . the paper presents an algorithm for deterministically >>>>> labelling blank nodes of any RDF graph such that the labels >>>>> of two isomorphic RDF graphs will "correspond". >>>> >>>> >>>> Excellent! It would be really good if we could define a W3C >>>> standard N-Triples and N-Quads canonicalization, for a variety >>>> of purposes. >>>> >>>> But I have a question about your algorithm. For greatest >>>> utility, a canonicalization algorithm should be robust against >>>> small changes in a graph. In other words, if graphs G and H >>>> are very similar then their canonicalizations should also be >>>> very similar. (By "very similar" I mean that G and H have >>>> large subgraphs G' and H' that are isomorphic.) >>>> >>>> How robust is your algorithm to small changes in a graph? Do >>>> you have any data on how much the canonicalization changes when >>>> a graph is changed? >>> >>> Still reading the paper, but given the use of cryptographic >>> hashing (p. 9), any minor change will result in a cascade effect >>> that will produce a wholly different output. >>> >>> Further, this is precisely what one wants if the >>> canonicalization algorithm is to be used for cryptographic >>> applications such as graph signing, as per e.g. the Credentials >>> CG. Similarity would be a different problem that entails >>> conflicting requirements. >> >> For the graph as a whole, yes, one would certainly want the hash of >> the graph to change if any part of the graph changes. But I was >> talking about the algorithm's re-labeling of the blank nodes within >> the graph. For graph comparison purposes, it would be very >> beneficial if the algorithm would tend to generate the same blank >> node labels even when a graph changes slightly. Are you saying >> that there is also value, for cryptographic hashing purposes, in >> the algorithm *changing* blank node labels if any part of the graph >> changes? If so, why? Can you explain? > > If the canonicalized BNode IDs are based on a hash of the dependent > statements, then the same ID should be generated for a graph having > more statements that do not relate to those BNodes. I would say that > the BNode IDs generated from the union of two disjoint graphs should > be the same as the BNode IDs generated from the union of those > graphs; this would make it useful for doing an RDF Diff. > > Note that the proposed RDF Normalization spec [1] does not generate > BNode IDs this way, and so would not have this desired purpose, > although I believe the algorithm could be changed to accommodate > this. > > The RDF Isomorphism algorithm used in the Ruby rdf-isomorphic gem, > based on Jeremy Carrol’s [2] generates BNode identifiers by creating > a hash of dependent nodes, and should have this property, for > disjoint sub-graphs, anyway. I think blank node stability is a critical property for a canonicalization algorithm to have (to the extent feasible), to enable graph diffs to be easily computed using standard diff tools. It would be a huge lost opportunity if we standardized a canonicalization algorithm without this property. I'll forward this to the public-json-ld@w3.org list. Thanks! David Booth > > Gregg > > [1] http://json-ld.org/spec/latest/rdf-dataset-normalization/ [2] > http://www.hpl.hp.com/techreports/2001/HPL-2001-293.pdf > >> David Booth >> > > > > >
Received on Thursday, 14 May 2015 19:53:40 UTC