Re: RDF Dataset Canonicalization - Formal Proof from Henry Story on 2021-03-27 (public-credentials@w3.org from March 2021)

From: Henry Story <henry.story@gmail.com>
Date: Sat, 27 Mar 2021 17:23:06 +0100
To: Chadwick David <D.W.Chadwick@kent.ac.uk>
Cc: public-credentials@w3.org
Message-Id: <E23EC7EC-C4AC-44BA-B241-7A3ADCC453F4@gmail.com>
> On 27 Mar 2021, at 16:12, David Chadwick <D.W.Chadwick@kent.ac.uk> wrote:
> 
> Hi Manu
> 
> as you know, implementers can use JWT proofs for VCs, and then canonicalisation is not needed. This is a major benefit of using JWS/JWT, as canonicalisation has been fraught with difficulties (as anybody who has worked with XML signatures will know, and discussions in the IETF PKIX group have highlighted).

The use of canonicalization far exceeds the need for it in VC space. It could be used
to sign large datasets in all kinds of areas without needing to keep the binary copy
around. It allows the same signature to be applied irrespective of encoding, allowing
the same signature to be valid for binary RDF, Turtle, Json-LD, and whatever other format
gets invented.

I’ll pass on this paper to mathamticians, though with no guarantee at all that anyone will
pick this up. But one can hope.

Henry

> 
> Kind regards
> 
> David
> 
> 
> 
> On 27/03/2021 14:57, Manu Sporny wrote:
>> Hi all,
>> 
>> As many of you know, the Verifiable Credentials that we use today are
>> digitally signed. In order to digitally sign and verify those VCs, there is a
>> process called canonicalization[1] that we perform on the data. The
>> specification for this algorithm has been available since ~2012[2].
>> 
>> For those of you that are not familiar with the concept, here's an imperfect
>> analogy that might help:
>> 
>> Imagine someone gives you a box of marbles in a marble factory and tells you
>> to sort them in a line. There are many people doing this in the factory and
>> everyone has to sort the combinations of marbles in the same order.
>> Canonicalization are the instructions that everyone uses to sort the marbles
>> -- green ones first, then red ones, then purple ones (but never put a purple
>> one after a red one), and so on.
>> 
>> For the work we do today, Verifiable Credentials are the box and all the data
>> in the Verifiable Credential are the marbles... and in order to create a
>> digital signature and verify it, we all have to sort those marbles in the same
>> way.
>> 
>> Even though it might sound simple, it's a several hundred year old,
>> fantastically hard problem -- called the graph isomorphism problem[3] -- and a
>> generalized solution to this problem, with an associated formal mathematical
>> proof that has been peer reviewed, didn't exist until recently.
>> 
>> The next set of items for the official W3C standards track have to do with
>> canonicalization and digital signatures, and in order to propose a charter, we
>> have to demonstrate that 1) there is a concrete proposed solution to the
>> canonicalization problem, 2) it has had adequate peer review, and 3) there is
>> enough deployment experience to feel confident that the suggested solution
>> workable.
>> 
>> This email addresses #1 above, with a formal proof for the RDF Dataset
>> Canonicalization problem. People that have reviewed the paper will follow it
>> up shortly with their findings to address #2 above. As for #3, whether folks
>> are aware of this or not, this algorithm has been used for 6 years in proof of
>> concept, pilot, and production deployments (so, we're very much behind the
>> curve on getting this standardized).
>> 
>> Please find the formal proof for the URDNA2015 (RDF Dataset Canonicalization)
>> algorithm attached. I expect the people that peer reviewed the formal proof to
>> chime in over the next couple of weeks.
>> 
>> -- manu
>> 
>> [1]
>> https://en.wikipedia.org/wiki/Canonicalization
>> 
>> [2]
>> https://github.com/json-ld/rdf-dataset-canonicalization
>> 
>> [3]
>> https://en.wikipedia.org/wiki/Graph_canonization
>> 
>> 
>>
Received on Saturday, 27 March 2021 16:23:22 UTC