- From: David Booth <david@dbooth.org>
- Date: Mon, 25 May 2015 22:24:23 -0400
- To: public-credentials@w3.org
- CC: Aidan Hogan <ahogan@dcc.uchile.cl>
I am very glad to hear that the W3C Credentials Community Group has taken on the work of specifying RDF graph canonicalization! It would be really great if a standard canonicalization algorithm could minimize blank node relabeling when a graph is changed slightly, to the extent feasible, since this would make diffs more precise. Any thoughts on the points below? How feasible does this sound? David Booth -------- Forwarded Message -------- Subject: Re: RDF canonicalization [was Re: deterministic naming of blank nodes] Date: Wed, 20 May 2015 22:49:36 -0400 From: David Booth <david@dbooth.org> To: ahogan@dcc.uchile.cl, semantic-web@w3.org Hi Aidan, On 05/17/2015 03:54 PM, ahogan@dcc.uchile.cl wrote: > On 14/05/2015 01:20, David Booth wrote: >> On 05/13/2015 02:14 AM, ahogan@dcc.uchile.cl wrote: >>> Aidan Hogan. "Skolemising Blank Nodes while Preserving Isomorphism". In >>> WWW, Florence, Italy, May 18–22, 2015 (to appear). >>> Available from: http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf [ . . . ] >> How robust is your algorithm to small changes in a graph? Do you have >> any data on how much the canonicalization changes when a graph is changed? > > [ . . . ] one has to be careful when talking about creating similar > canonicalisations for similar graphs. Consider for example producing an > overall hash for two large graphs G and H that differ only in the value of > one IRI in one triple. I see your point about creating similar hashes for similar graphs, but my question was intended to be about the *canonicalizations* of similar graphs -- not the hashes of those graphs. I don't mind if the hashes of G and H are very different, but it would be nice if their canonicalizations could be as similar as possible as often as possible, so that standard diff and patch tools could be used on them. Could your algorithm be adapted that way? I assume that a canonicalization algorithm will be limited in the degree to which it can achieve this goal, and it may be desirable for the algorithm to have knowledge of OWL semantics in order to get the best results. For example, if the graph implies that _:b1 owl:sameAs _:b2, then it might replace all occurrences of _:b2 with _:b1 before canonicalizing. Or, if a graph contains: :alice foaf:knows _:b3 . _:b3 :hasSSN "123-45-6789" . :bob foaf:knows _:b4 . _:b4 :hasSSN "123-45-6789" . :hasSSN a owl:InverseFunctionalProperty . then _:b3 and _:b4 can be merged into a single blank node prior to blank node canonicalization. Furthermore, if a blank node is known to have a key that uniquely identifies it, then it would be best if the canonical name of that blank node would depend only on those key properties (recursively). For example, given: :carol :owns _:b5 . _:b5 :make "Acme" ; :serialNumber "1234" ; :color "red" . If :make and :serialNumber form a composite key, then the canonical name of _:b5 should depend only on those properties -- not on :color. That would enable _:b6 in the following to be canonicalized to the same blank node label, even though it specifies a :yearPurchased and does not specify a :color : :carol :owns _:b6 . _:b6 :model "zinger" ; :serialNumber "1234" ; :yearPurchased "2013" . OWL2 supports keys via owl:hasKey. http://www.w3.org/TR/2009/WD-owl2-new-features-20090611/#F9:_Keys Could your algorithm be extended to take advantage of OWL assertions? For the best canonicalization results, this might mean making use of the closure of graph G's imports to aid in canonicalizing graph G. To do this reliably it might be necessary for G to directly contain a crypto hash of the closure of its imports, so that any change to the imported files would be detectable. David Booth
Received on Tuesday, 26 May 2015 02:24:53 UTC