Re: RDF canonicalization [was Re: deterministic naming of blank nodes]

On Thu, May 14, 2015 at 6:20 AM, David Booth <david@dbooth.org> wrote:
> On 05/13/2015 02:14 AM, ahogan@dcc.uchile.cl wrote:
>>
>> Aidan Hogan. "Skolemising Blank Nodes while Preserving Isomorphism". In
>> WWW, Florence, Italy, May 18–22, 2015 (to appear).
>> Available from: http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf
>>
>> . . .  the paper presents an algorithm for deterministically
>> labelling blank nodes of any RDF graph such that the labels of two
>> isomorphic RDF graphs will "correspond".
>
>
> Excellent!  It would be really good if we could define a W3C standard
> N-Triples and N-Quads canonicalization, for a variety of purposes.
>
> But I have a question about your algorithm.  For greatest utility, a
> canonicalization algorithm should be robust against small changes in a
> graph.  In other words, if graphs G and H are very similar then their
> canonicalizations should also be very similar.  (By "very similar" I mean
> that G and H have large subgraphs G' and H' that are isomorphic.)
>
> How robust is your algorithm to small changes in a graph?  Do you have any
> data on how much the canonicalization changes when a graph is changed?

Still reading the paper, but given the use of cryptographic hashing
(p. 9), any minor change will result in a cascade effect that will
produce a wholly different output.

Further, this is precisely what one wants if the canonicalization
algorithm is to be used for cryptographic applications such as graph
signing, as per e.g. the Credentials CG. Similarity would be a
different problem that entails conflicting requirements.

-- 
Arto Bendiken | @bendiken | http://ar.to

Received on Thursday, 14 May 2015 17:04:18 UTC