Re: RDF Dataset Canonicalization - Formal Proof from Orie Steele on 2021-03-29 (public-credentials@w3.org from March 2021)

From: Orie Steele <orie@transmute.industries>
Date: Mon, 29 Mar 2021 13:25:21 -0500
To: Dave Longley <dlongley@digitalbazaar.com>
Cc: Steve Capell <steve.capell@gmail.com>, Tobias Looker <tobias.looker@mattr.global>, Christopher Allen <ChristopherA@lifewithalacrity.com>, Adrian Gropper <agropper@healthurl.com>, Alan Karp <alanhkarp@gmail.com>, Drummond Reed <drummond.reed@evernym.com>, Manu Sporny <msporny@digitalbazaar.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CAN8C-_K=8r+ZA7wPgbcWc1Ostvq-VUcmCM44yBnGoR+B1mo3TA@mail.gmail.com>
I love how a simple concept like canonicalization quickly leads to merkle
trees, hmacs and zkps....

Since JOSE has already been raised, I want to point out that it does rely
on a form of canonicalization... it's called base64url of json, and it
inflates your data payload by 33%...

33% of a small number is fine... but of a large number, like a very large
credential... it becomes very unreasonable...

so in practice you would store the base64url as binary in a database...
which would then become impossible to search... so you might store
the decoded JWT as JSON, and it will almost look like an LD Proof...
assuming you don't ever reorder the json entries, which will break the
signature.

A great example of a simple solution which leads to performance problems
which ultimately gate the use of the technology for large scale data.

We should be similarly concerned about RDF Canonicalization.

There is a cost to it, but it's less so on storage, and more so on
computation.

Alternatives have been raised before such as JCS, but they are not really
doing the same thing, JCS is in a sense just like JOSE without the 33%
bloat for base64url encoding.

Ultimately this is a question of the most efficient computation and storage
mechanism for semantically unambiguous cryptographically verifiable
information.

I think URDNA2015 does a great job, and is worth formalizing, however I
want to peer into the future and ask what is our ideal solution to this
problem?

I dream of a binary replacement for URDNA2015 where semantics and encodings
can be retained, while graph representations can be minimized unambiguously.

perhaps some form of nquads + multicodec + cbor-ld.

Regards,

OS





On Mon, Mar 29, 2021 at 12:16 PM Dave Longley <dlongley@digitalbazaar.com>
wrote:

>
> Tobias,
>
> One idea with the LD signature redaction suite that never took off was
> to have the issuer generate an HMAC key and use that to generate the
> salts -- and then give the holder the HMAC key so they can do the same
> when sharing.
>
> On 3/28/21 4:33 PM, Steve Capell wrote:
> > Hi Tobias
> >
> > Good questions - which I’ve forwarded to the Singapore team for an
> > authoritative answer
> >
> > Here’s my non-authoritative attempt
> > - salts are an array of uuids I think -
> > see https://edi3.org/specs/edi3-notary/develop/#611-salting-the-data
> > - signature correlation - not sure but I’d mention that all use cases
> > for this approach so far are for cross border trade documents where the
> > subject is a public identifier such as a business number.  The design
> > intent is that the identity is correlatable.
> > - we haven’t noticed performance issues of any significance but we are
> > talking volumes of only a few million per year
> >
> > Steven Capell
> > Mob: 0410 437854
> >
> >> On 28 Mar 2021, at 2:53 pm, Tobias Looker <tobias.looker@mattr.global>
> >> wrote:
> >>
> >> 
> >> > I’m a big fan of this approach, a form of redaction distinct from zk
> >> forms of selective disclosure.
> >>
> >> > There was an attempt to spec one here in the CCG three-four years
> >> ago, but it died on the vine.
> >>
> >> I'm also interested in learning more about this approach too, the
> >> questions I had last time were
> >>
> >> 1. How the salt for each redactable statement would be managed in a
> >> way that would not leak the abstraction that "Linked Data Proofs" sets
> >> up. For example would the attached proof block have to have a long
> >> array of salts?
> >> 2. Proof sizes, having to have a salt per-statement signed as a part
> >> of the proof would significantly increase the size of the proofs
> >> representation.
> >> 3. Signature correlation, perhaps not important in this scheme, but I
> >> think the approach would require revealing a fixed signature
> >> regardless of which parts are redacted from the original proof?
> >> 4. Performance? Also perhaps a non-issue but if anyone has
> >> info/benchmarks around how the scheme might scale with the size of the
> >> data graph signed, that would be great to look at?
> >>
> >> Thanks,
> >> Mattr website <https://mattr.global>
> >> *Tobias Looker*
> >> Mattr
> >> +64 (0) 27 378 0461
> >> tobias.looker@mattr.global <mailto:tobias.looker@mattr.global>
> >> Mattr website <https://mattr.global> Mattr on LinkedIn
> >> <https://www.linkedin.com/company/mattrglobal>       Mattr on Twitter
> >> <https://twitter.com/mattrglobal>    Mattr on Github
> >> <https://github.com/mattrglobal>
> >>
> >>
> >> This communication, including any attachments, is confidential. If you
> >> are not the intended recipient, you should not read it - please
> >> contact me immediately, destroy it, and do not copy or use any part of
> >> this communication or disclose anything about it. Thank you. Please
> >> note that this communication does not designate an information system
> >> for the purposes of the Electronic Transactions Act 2002.
> >>
> >>
> >> On Sun, Mar 28, 2021 at 3:49 PM Christopher Allen
> >> <ChristopherA@lifewithalacrity.com
> >> <mailto:ChristopherA@lifewithalacrity.com>> wrote:
> >>
> >>     On Sat, Mar 27, 2021 at 7:22 PM Steve Capell
> >>     <steve.capell@gmail.com <mailto:steve.capell@gmail.com>> wrote:
> >>
> >>         The Singapore government https://www.openattestation.com/ does
> >>         this already . Version 3 is W3C VC data model compliant
> >>
> >>         Each element is hashed (with salt I think) and then the hash
> >>         of the hashed is the document hash that is notarised
> >>
> >>         The main rationale is selective redaction (because the root
> >>         hash is unchanged when some clear text is hidden). But I
> >>         suppose it simplifies canonicalisation too...
> >>
> >>
> >>     I’m a big fan of this approach, a form of redaction distinct from
> >>     zk forms of selective disclosure.
> >>
> >>     There was an attempt to spec one here in the CCG three-four years
> >>     ago, but it died on the vine.
> >>
> >>     I’d be interested is seeing this spec & implementation. Any links?
> >>
> >>     — Christopher Allen [via iPhone]
> >>
> >>
> >> This communication, including any attachments, is confidential. If you
> are not the intended recipient, you should not read it - please contact me
> immediately, destroy it, and do not copy or use any part of this
> communication or disclose anything about it. Thank you. Please note that
> this communication does not designate an information system for the
> purposes of the Electronic Transactions Act 2002.
>
>
> --
> Dave Longley
> CTO
> Digital Bazaar, Inc.
>
>

-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Monday, 29 March 2021 18:25:46 UTC