RDF Dataset Canonicalization - Formal Proof

Hi all,

As many of you know, the Verifiable Credentials that we use today are
digitally signed. In order to digitally sign and verify those VCs, there is a
process called canonicalization[1] that we perform on the data. The
specification for this algorithm has been available since ~2012[2].

For those of you that are not familiar with the concept, here's an imperfect
analogy that might help:

Imagine someone gives you a box of marbles in a marble factory and tells you
to sort them in a line. There are many people doing this in the factory and
everyone has to sort the combinations of marbles in the same order.
Canonicalization are the instructions that everyone uses to sort the marbles
-- green ones first, then red ones, then purple ones (but never put a purple
one after a red one), and so on.

For the work we do today, Verifiable Credentials are the box and all the data
in the Verifiable Credential are the marbles... and in order to create a
digital signature and verify it, we all have to sort those marbles in the same
way.

Even though it might sound simple, it's a several hundred year old,
fantastically hard problem -- called the graph isomorphism problem[3] -- and a
generalized solution to this problem, with an associated formal mathematical
proof that has been peer reviewed, didn't exist until recently.

The next set of items for the official W3C standards track have to do with
canonicalization and digital signatures, and in order to propose a charter, we
have to demonstrate that 1) there is a concrete proposed solution to the
canonicalization problem, 2) it has had adequate peer review, and 3) there is
enough deployment experience to feel confident that the suggested solution
workable.

This email addresses #1 above, with a formal proof for the RDF Dataset
Canonicalization problem. People that have reviewed the paper will follow it
up shortly with their findings to address #2 above. As for #3, whether folks
are aware of this or not, this algorithm has been used for 6 years in proof of
concept, pilot, and production deployments (so, we're very much behind the
curve on getting this standardized).

Please find the formal proof for the URDNA2015 (RDF Dataset Canonicalization)
algorithm attached. I expect the people that peer reviewed the formal proof to
chime in over the next couple of weeks.

-- manu

[1]https://en.wikipedia.org/wiki/Canonicalization
[2]https://github.com/json-ld/rdf-dataset-canonicalization
[3]https://en.wikipedia.org/wiki/Graph_canonization

-- 
Manu Sporny - https://www.linkedin.com/in/manusporny/
Founder/CEO - Digital Bazaar, Inc.
blog: Veres One Decentralized Identifier Blockchain Launches
https://tinyurl.com/veres-one-launches

Received on Saturday, 27 March 2021 14:57:23 UTC