Re: canonical RDF graph representations

On Tue, 2011-03-01 at 14:47 +0100, Melvin Carvalho wrote: 
> On 1 March 2011 14:37, Peter Frederick Patel-Schneider
> <pfps@research.bell-labs.com> wrote:
> > This thrust for a canonical serialization puzzles me.  What problem
> > would a canonical serialization solve?
> 
> Off the top of my head:
> 
> 1. SIgning RDF
> 2. Signing Named Graphs
> 3. Signing Triples

Plausible use cases, though as I mentioned to Peter it is possible to
sign a collection of triples (or quads for that matter) without
canonical serialization if you are prepared to use set-hashes.

> 4. Fast Comparisons

Where are the savings?

The cost of graph isomorphism is NP (GI-complete); yes you can compare
two canonical serializations in linear time but you've just moved the
cost into the serialization step.

The tricks for efficient signing such as Jeremy's pre-canonicalization
[1] could equally well be used directly for comparison purposes.

Similarly, the set-hash technique would allow you do fast hash
comparisons bypassing the canonical serialization step.

> 5. Synchronization

That might be better approached via an the various ways of breaking a
graph into units which can be separately synchronized (Minimum
Self-Contained Graph, or maybe RDF Molecules). You still need to compare
the resulting units but at least now you are working on a reduced grain
size.

Dave

[1] http://www.hpl.hp.com/techreports/2003/HPL-2003-142.html

> >From the paper:
> 
> Hash digests have been used extensively for file comparison, for example in [1],
> where it is used for avoiding the duplicate storage of identical
> files, and in backup
> systems.
> 
> 
> 
> >
> > Peter F. Patel-Schneider
> > Bell Labs Research
> >
> >
> > From: Melvin Carvalho <melvincarvalho@gmail.com>
> > Subject: Re: canonical RDF graph representations
> > Date: Tue, 1 Mar 2011 07:13:08 -0600
> >
> >> On 1 March 2011 10:50, Ivan Shmakov <ivan@main.uusia.org> wrote:
> >>>        The “The case for generating URIs by hashing RDF content” paper
> >>>        [1], dating back to 2002, mentions that “there is no current
> >>>        canonical serialization standard for RDF”.  (Then, they suggest
> >>>        their own canonical representation.)
> >>>
> >>>        I wonder, has such a standard been since proposed?
> >>>
> >>> [1] http://www.hpl.hp.com/techreports/2002/HPL-2002-216.pdf
> >>
> >> Yes, it's important to have a standard way canonicalize RDF, or, at
> >> least, RDF/XML imho.  It's required for xmlsig, I think.
> >>
> >> I think there was an issue with bnodes ... maybe it's something we can solve.
> >>
> >> Maybe we can get this quickly to rec status?
> >>
> >>>
> >>> --
> >>> FSF associate member #7257
> >>>
> >>
> >
> 

Received on Tuesday, 1 March 2011 14:26:16 UTC