- From: Dave Longley <dlongley@digitalbazaar.com>
- Date: Tue, 26 May 2015 10:41:53 -0400
- To: David Booth <david@dbooth.org>, public-credentials@w3.org
- CC: Aidan Hogan <ahogan@dcc.uchile.cl>
On 05/25/2015 10:24 PM, David Booth wrote: > I am very glad to hear that the W3C Credentials Community Group has > taken on the work of specifying RDF graph canonicalization! It would > be really great if a standard canonicalization algorithm could > minimize blank node relabeling when a graph is changed slightly, to > the extent feasible, since this would make diffs more precise. Any > thoughts on the points below? How feasible does this sound? It's feasible but would come with some potential surprises in a number of edge cases. With the current algorithm, any change to a statement involving a blank node would change the identifier for that blank node and any other blank nodes that are connected to it through other blank nodes. In addition, any sub-isomorphisms would also be affected. That being said, we've added a note to the spec to consider using the hashes we generate based on information about blank nodes as their identifiers instead of relabeling them with a `c14n` prefix. This would add a considerable amount of stability to them and therefore better support diffing. Another option that could be considered would be to provide a hook for an optional custom callback that would provide "additional data" to be included during the blank node hashing process in order to produce a more unique and stable result. This could be part of a normalization API spec that uses the core algorithm with modifications for such hooks. > > David Booth > > -------- Forwarded Message -------- > Subject: Re: RDF canonicalization [was Re: deterministic naming of > blank nodes] > Date: Wed, 20 May 2015 22:49:36 -0400 > From: David Booth <david@dbooth.org> > To: ahogan@dcc.uchile.cl, semantic-web@w3.org > > Hi Aidan, > > On 05/17/2015 03:54 PM, ahogan@dcc.uchile.cl wrote: >> On 14/05/2015 01:20, David Booth wrote: >>> On 05/13/2015 02:14 AM, ahogan@dcc.uchile.cl wrote: >>>> Aidan Hogan. "Skolemising Blank Nodes while Preserving >>>> Isomorphism". In >>>> WWW, Florence, Italy, May 18–22, 2015 (to appear). >>>> Available from: http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf > [ . . . ] >>> How robust is your algorithm to small changes in a graph? Do you have >>> any data on how much the canonicalization changes when a graph is >>> changed? >> >> [ . . . ] one has to be careful when talking about creating similar >> canonicalisations for similar graphs. Consider for example producing an >> overall hash for two large graphs G and H that differ only in the >> value of >> one IRI in one triple. > > I see your point about creating similar hashes for similar graphs, but > my question was intended to be about the *canonicalizations* of similar > graphs -- not the hashes of those graphs. I don't mind if the hashes of > G and H are very different, but it would be nice if their > canonicalizations could be as similar as possible as often as possible, > so that standard diff and patch tools could be used on them. Could your > algorithm be adapted that way? > > I assume that a canonicalization algorithm will be limited in the degree > to which it can achieve this goal, and it may be desirable for the > algorithm to have knowledge of OWL semantics in order to get the best > results. For example, if the graph implies that _:b1 owl:sameAs _:b2, > then it might replace all occurrences of _:b2 with _:b1 before > canonicalizing. Or, if a graph contains: > > :alice foaf:knows _:b3 . > _:b3 :hasSSN "123-45-6789" . > :bob foaf:knows _:b4 . > _:b4 :hasSSN "123-45-6789" . > > :hasSSN a owl:InverseFunctionalProperty . > > then _:b3 and _:b4 can be merged into a single blank node prior to blank > node canonicalization. > > Furthermore, if a blank node is known to have a key that uniquely > identifies it, then it would be best if the canonical name of that blank > node would depend only on those key properties (recursively). For > example, given: > > :carol :owns _:b5 . > _:b5 :make "Acme" ; > :serialNumber "1234" ; > :color "red" . > > If :make and :serialNumber form a composite key, then the canonical name > of _:b5 should depend only on those properties -- not on :color. That > would enable _:b6 in the following to be canonicalized to the same blank > node label, even though it specifies a :yearPurchased and does not > specify a :color : > > :carol :owns _:b6 . > _:b6 :model "zinger" ; > :serialNumber "1234" ; > :yearPurchased "2013" . > > OWL2 supports keys via owl:hasKey. > http://www.w3.org/TR/2009/WD-owl2-new-features-20090611/#F9:_Keys > > Could your algorithm be extended to take advantage of OWL assertions? > For the best canonicalization results, this might mean making use of the > closure of graph G's imports to aid in canonicalizing graph G. To do > this reliably it might be necessary for G to directly contain a crypto > hash of the closure of its imports, so that any change to the imported > files would be detectable. > > David Booth > > > -- Dave Longley CTO Digital Bazaar, Inc. http://digitalbazaar.com
Received on Tuesday, 26 May 2015 14:42:17 UTC