- From: Gregg Kellogg <gregg@greggkellogg.com>
- Date: Thu, 14 May 2015 13:31:10 -0700
- To: David Booth <david@dbooth.org>
- Cc: "public-json-ld@w3.org" <public-json-ld@w3.org>, Gregg Kellogg <gregg@greggkellogg.net>, Aidan Hogan <ahogan@dcc.uchile.cl>
On May 14, 2015, at 12:53 PM, David Booth <david@dbooth.org> wrote: > > FYI. I did not realize that work was active on JSON-LD > canonicalization, but I see the draft document was updated April 8 2015. > I'm glad to see it is! Yes, it's required for RDF Signatures used for payments and credentials. > It would be nice if the JSON-LD canonicalization algorithm were aligned > with an N-Triples / N-Quads canonicalization algorithm, and if both > could ensure blank node stability in the face of small graph changes. > would this be feasible? Even though it's in the JSON-LD repo, it is an RDF Dataset algorithm, that doesn't depend on JSON-LD, except that the test inputs are serialized this way. The results are normalized N-Triples/Quads. Gregg > David Booth > > -------- Forwarded Message -------- > Subject: Re: RDF canonicalization [was Re: deterministic naming of blank > nodes] > Date: Thu, 14 May 2015 15:44:04 -0400 > From: David Booth <david@dbooth.org> > To: Gregg Kellogg <gregg@greggkellogg.net> > CC: Arto Bendiken <arto@bendiken.net>, W3C Semantic Web IG > <semantic-web@w3.org> > > On 05/14/2015 02:50 PM, Gregg Kellogg wrote: >>> On May 14, 2015, at 10:27 AM, David Booth <david@dbooth.org> >>> wrote: >>> >>>> On 05/14/2015 01:03 PM, Arto Bendiken wrote: >>>> On Thu, May 14, 2015 at 6:20 AM, David Booth <david@dbooth.org> >>>> wrote: >>>>> On 05/13/2015 02:14 AM, ahogan@dcc.uchile.cl wrote: >>>>>> >>>>>> Aidan Hogan. "Skolemising Blank Nodes while Preserving >>>>>> Isomorphism". In WWW, Florence, Italy, May 18–22, 2015 (to >>>>>> appear). Available from: >>>>>> http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf >>>>>> >>>>>> . . . the paper presents an algorithm for deterministically >>>>>> labelling blank nodes of any RDF graph such that the labels >>>>>> of two isomorphic RDF graphs will "correspond". >>>>> >>>>> >>>>> Excellent! It would be really good if we could define a W3C >>>>> standard N-Triples and N-Quads canonicalization, for a variety >>>>> of purposes. >>>>> >>>>> But I have a question about your algorithm. For greatest >>>>> utility, a canonicalization algorithm should be robust against >>>>> small changes in a graph. In other words, if graphs G and H >>>>> are very similar then their canonicalizations should also be >>>>> very similar. (By "very similar" I mean that G and H have >>>>> large subgraphs G' and H' that are isomorphic.) >>>>> >>>>> How robust is your algorithm to small changes in a graph? Do >>>>> you have any data on how much the canonicalization changes when >>>>> a graph is changed? >>>> >>>> Still reading the paper, but given the use of cryptographic >>>> hashing (p. 9), any minor change will result in a cascade effect >>>> that will produce a wholly different output. >>>> >>>> Further, this is precisely what one wants if the >>>> canonicalization algorithm is to be used for cryptographic >>>> applications such as graph signing, as per e.g. the Credentials >>>> CG. Similarity would be a different problem that entails >>>> conflicting requirements. >>> >>> For the graph as a whole, yes, one would certainly want the hash of >>> the graph to change if any part of the graph changes. But I was >>> talking about the algorithm's re-labeling of the blank nodes within >>> the graph. For graph comparison purposes, it would be very >>> beneficial if the algorithm would tend to generate the same blank >>> node labels even when a graph changes slightly. Are you saying >>> that there is also value, for cryptographic hashing purposes, in >>> the algorithm *changing* blank node labels if any part of the graph >>> changes? If so, why? Can you explain? >> >> If the canonicalized BNode IDs are based on a hash of the dependent >> statements, then the same ID should be generated for a graph having >> more statements that do not relate to those BNodes. I would say that >> the BNode IDs generated from the union of two disjoint graphs should >> be the same as the BNode IDs generated from the union of those >> graphs; this would make it useful for doing an RDF Diff. >> >> Note that the proposed RDF Normalization spec [1] does not generate >> BNode IDs this way, and so would not have this desired purpose, >> although I believe the algorithm could be changed to accommodate >> this. >> >> The RDF Isomorphism algorithm used in the Ruby rdf-isomorphic gem, >> based on Jeremy Carrol’s [2] generates BNode identifiers by creating >> a hash of dependent nodes, and should have this property, for >> disjoint sub-graphs, anyway. > > I think blank node stability is a critical property for a > canonicalization algorithm to have (to the extent feasible), to enable > graph diffs to be easily computed using standard diff tools. It would > be a huge lost opportunity if we standardized a canonicalization > algorithm without this property. > > I'll forward this to the public-json-ld@w3.org list. > > Thanks! > David Booth > >> >> Gregg >> >> [1] http://json-ld.org/spec/latest/rdf-dataset-normalization/ [2] >> http://www.hpl.hp.com/techreports/2001/HPL-2001-293.pdf >> >>> David Booth > >
Received on Thursday, 14 May 2015 20:31:39 UTC