Re: Comments on RDF graph canonicalization draft

On 03/08/2013 09:59 AM, David Booth wrote:
> Regarding
> I am delighted to see that people are working on this!  This will 
> help fill a huge gap that we currently have.

Not only working on it... we have a fairly stable, general solution to
the problem that is an improvement on the work that Jeremy Carroll did.
We're putting it into production, in a financial system, in a week or two.

We're also now calling it: RDF Dataset Normalization

Your comments are being tracked here:

> A few small suggestions:
> 1. You really should reference Jeremy Carroll's original work on RDF 
> canonicalization: 

While we didn't agree with the approach that Jeremy's paper took, we did
read it, and do agree that it was foundational work in the area. We
should reference it.

The normalization algorithm is an improvement over Jeremy Carroll's work:

1. It doesn't require the original input graph, just a set of quads.
2. It doesn't inject new blank nodes into the graph.
3. It is a general solution for the RDF data model.

> 2. I do not find the word "canonical" or "canonicalization" anywhere
>  in the document, although it is clearly implied by the bnode prefix
>  "c14n". It is a stylistic choice whether to call the process 
> "normalization" or "canonicalization".  In my observation, over the 
> years "canonicalization" has been used more specifically for this 
> process (as Jeremy Carroll's 2003 paper did), and hence would be the
>  better choice of primary term.  But regardless of the choice, I 
> think it is important to include both terms up front, to enable 
> searchers to find it more easily.

As with the JSON-LD specification, we are trying to use language that
will be familiar to a larger community. We chose to use one over the
other. Normalization seemed to be the more accessible term, and easier
to say, than canonicalization.

That said, maybe it would be good to mention both at some point in the
text. Perhaps others have strong feelings over normalization vs.
canonicalization. Canonicalization is the more specific/accurate term.

All that to say - good point, we should revisit the decision.

> 3. The document contains an editorial note saying that the algorithm
>  is obsolete.  Could you please point to the newer one?  Even if it 
> is not yet finalized, it would be nice to be able to see what it is,
>  whether it is just an email message, a snippet of code or whatever.
>  If you could point to pieces of working code -- even if they are not
>  finished -- that would be great too.

Dave Longley did a great job of pointing you to the current state of
implementation for this algorithm.

-- manu

Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
President/CEO - Digital Bazaar, Inc.
blog: Aaron Swartz, PaySwarm, and Academic Journals

Received on Friday, 8 March 2013 17:16:13 UTC