Re: RDF Dataset Canonicalization - Formal Proof from Alan Karp on 2021-03-27 (public-credentials@w3.org from March 2021)

From: Alan Karp <alanhkarp@gmail.com>
Date: Sat, 27 Mar 2021 12:01:39 -0700
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: W3C Credentials CG <public-credentials@w3.org>
Message-ID: <CANpA1Z3qHoH_sPodk5n95c-9Hs6x-veC7xN1sJY7KxtpRAnu=w@mail.gmail.com>

Manu Sporny <msporny@digitalbazaar.com> wrote:

>
> Hey Alan, I didn't know that you had worked in the graph digest space
> before... neat! :)
>

Yeah.  I'm still trying to figure out what I'm going to be when I grow up.

At this point, you understand my paper better than I do :)  I expected you
had a good reason for canonicalization, but I just wanted to make sure you
were aware of the possibility of doing without it.  One issue you didn't
mention about our paper is that a set hash is weaker against collision
attacks.  I thought that might be the reason you couldn't use that
approach.  In case you're interested, we wrote a follow-up,
https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf.

--------------
Alan Karp


On Sat, Mar 27, 2021 at 11:51 AM Manu Sporny <msporny@digitalbazaar.com>
wrote:

> On 3/27/21 1:35 PM, Alan Karp wrote:
> > It is possible to produce a digest without canonicalization by using a
> set
> > hash, which I see you mention in your paper.  I haven't read the whole
> > thing yet, so I don't know why you need canonicalization. See
> > https://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.pdf
>
> Hey Alan, I didn't know that you had worked in the graph digest space
> before... neat! :)
>
> I just read your paper front to back several times. I'm fairly certain I
> understand it:
>
> Fundamentally, the paper uses a cryptographic combination function over
> every
> statement in a graph that is both associative and commutative to generate a
> final hash value in O(n) (with some additional protections against
> adversarial
> brute forcing).
>
> There are two things the paper also depends on that make the solution
> non-generalized: 1) you presume you know how the sender is going to label
> the
> blank nodes, or 2) you annotate the graph with extra information about the
> blank nodes. The first is an assumption you cannot depend on in a global
> system, the second modifies the graph in ways that require an annotation
> language and a way of encapsulating the annotations in the message to the
> receiver (unacceptable complexity for most of the use cases we're dealing
> with).
>
> In addition, the solution only contemplates RDF Graphs and not RDF Datasets
> (which didn't exist at the time the paper was written).
>
> So, while it is true that you can produce *a* digest w/o doing
> canonicalization using set hashes, the solution presumes things that we
> can't
> presume in real world scenarios. For example, a non-trivial subset of
> developers are not going to understand why they have to "always generate a
> unique cryptographic ID for every one of their JSON objects". Or why the
> data
> coming into their system has a whole bunch of annotations in their JSON
> objects that look like garbage to them, that they then have to turn around
> and
> store in their databases.
>
> We had considered all of these things before writing the RDF Dataset
> Canonicalization paper.
>
> -- manu
>
> --
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> blog: Veres One Decentralized Identifier Blockchain Launches
> https://tinyurl.com/veres-one-launches
>
>
>

Received on Saturday, 27 March 2021 19:03:04 UTC