Re: C14N use case: version control from Andy Seaborne on 2011-06-28 (public-rdf-wg@w3.org from June 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 28 Jun 2011 14:55:40 +0100
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4E09DD5C.1030401@epimorphics.com>

On 27/06/11 19:55, Jeremy Carroll wrote:
>
> This is the use case that TopQuadrant has internally that prompted
> discussion between me and Gavin leading to this thread on this mailing
> list.
>
> A significant portion of our product source is in RDF.
> We are migrating our version control system to GIT to reduce cost of
> merging
> This will not work for RDF in the form that we currently store it,
> because simple changes result in completely different documents
> We are now working on a version of my earlier paper with additional
> steps to insure reasonable stability of blank node IDs.
>
> (In the terms of the paper the bnode ids will be based on a hashcode
> generated from the first distinctive triple for that bnode).

I'm curious - why not store skolemized data?  The skolemization URI 
could record sufficient information related to when the bNode id was 
first created.  It's then fully reversible in syntax terms (with a 
little parser processing "deskolemization") to make bNodes reconstructable.

For me, this is the point of skolemization - finding a bNode again need 
not be just across the web; it can also be temporally across serialized 
data.

	Andy

> This will then give, in the vast majority of cases, small changes to the
> RDF will result in small changes to the canonical form (larger changes
> will occur at discontinuities in the hashing algorithm, when the number
> of buckets need expanding)
>
> Jeremy
>
>

Received on Tuesday, 28 June 2011 13:56:20 UTC