Re: RDF Dataset Canonicalization - Formal Proof

For what it is worth, my principal source for mathematics, graphs, and the
semantic web comes from this group:

https://www.meetup.com/Category-Theory/

I expect to be over there for quite some time.

On Monday, March 29, 2021, Orie Steele <orie@transmute.industries> wrote:

> I love how a simple concept like canonicalization quickly leads to merkle
> trees, hmacs and zkps....
>
> Since JOSE has already been raised, I want to point out that it does rely
> on a form of canonicalization... it's called base64url of json, and it
> inflates your data payload by 33%...
>
> 33% of a small number is fine... but of a large number, like a very large
> credential... it becomes very unreasonable...
>
> so in practice you would store the base64url as binary in a database...
> which would then become impossible to search... so you might store
> the decoded JWT as JSON, and it will almost look like an LD Proof...
> assuming you don't ever reorder the json entries, which will break the
> signature.
>
> A great example of a simple solution which leads to performance problems
> which ultimately gate the use of the technology for large scale data.
>
> We should be similarly concerned about RDF Canonicalization.
>
> There is a cost to it, but it's less so on storage, and more so on
> computation.
>
> Alternatives have been raised before such as JCS, but they are not really
> doing the same thing, JCS is in a sense just like JOSE without the 33%
> bloat for base64url encoding.
>
> Ultimately this is a question of the most efficient computation and
> storage mechanism for semantically unambiguous cryptographically verifiable
> information.
>
> I think URDNA2015 does a great job, and is worth formalizing, however I
> want to peer into the future and ask what is our ideal solution to this
> problem?
>
> I dream of a binary replacement for URDNA2015 where semantics and
> encodings can be retained, while graph representations can be minimized
> unambiguously.
>
> perhaps some form of nquads + multicodec + cbor-ld.
>
> Regards,
>
> OS
>
>
>
>
>
> On Mon, Mar 29, 2021 at 12:16 PM Dave Longley <dlongley@digitalbazaar.com>
> wrote:
>
>>
>> Tobias,
>>
>> One idea with the LD signature redaction suite that never took off was
>> to have the issuer generate an HMAC key and use that to generate the
>> salts -- and then give the holder the HMAC key so they can do the same
>> when sharing.
>>
>> On 3/28/21 4:33 PM, Steve Capell wrote:
>> > Hi Tobias
>> >
>> > Good questions - which I’ve forwarded to the Singapore team for an
>> > authoritative answer
>> >
>> > Here’s my non-authoritative attempt
>> > - salts are an array of uuids I think -
>> > see https://edi3.org/specs/edi3-notary/develop/#611-salting-the-data
>> > - signature correlation - not sure but I’d mention that all use cases
>> > for this approach so far are for cross border trade documents where the
>> > subject is a public identifier such as a business number.  The design
>> > intent is that the identity is correlatable.
>> > - we haven’t noticed performance issues of any significance but we are
>> > talking volumes of only a few million per year
>> >
>> > Steven Capell
>> > Mob: 0410 437854
>> >
>> >> On 28 Mar 2021, at 2:53 pm, Tobias Looker <tobias.looker@mattr.global>
>> >> wrote:
>> >>
>> >> 
>> >> > I’m a big fan of this approach, a form of redaction distinct from zk
>> >> forms of selective disclosure.
>> >>
>> >> > There was an attempt to spec one here in the CCG three-four years
>> >> ago, but it died on the vine.
>> >>
>> >> I'm also interested in learning more about this approach too, the
>> >> questions I had last time were
>> >>
>> >> 1. How the salt for each redactable statement would be managed in a
>> >> way that would not leak the abstraction that "Linked Data Proofs" sets
>> >> up. For example would the attached proof block have to have a long
>> >> array of salts?
>> >> 2. Proof sizes, having to have a salt per-statement signed as a part
>> >> of the proof would significantly increase the size of the proofs
>> >> representation.
>> >> 3. Signature correlation, perhaps not important in this scheme, but I
>> >> think the approach would require revealing a fixed signature
>> >> regardless of which parts are redacted from the original proof?
>> >> 4. Performance? Also perhaps a non-issue but if anyone has
>> >> info/benchmarks around how the scheme might scale with the size of the
>> >> data graph signed, that would be great to look at?
>> >>
>> >> Thanks,
>> >> Mattr website <https://mattr.global>
>> >> *Tobias Looker*
>> >> Mattr
>> >> +64 (0) 27 378 0461
>> >> tobias.looker@mattr.global <mailto:tobias.looker@mattr.global>
>> >> Mattr website <https://mattr.global> Mattr on LinkedIn
>> >> <https://www.linkedin.com/company/mattrglobal>       Mattr on Twitter
>> >> <https://twitter.com/mattrglobal>    Mattr on Github
>> >> <https://github.com/mattrglobal>
>> >>
>> >>
>> >> This communication, including any attachments, is confidential. If you
>> >> are not the intended recipient, you should not read it - please
>> >> contact me immediately, destroy it, and do not copy or use any part of
>> >> this communication or disclose anything about it. Thank you. Please
>> >> note that this communication does not designate an information system
>> >> for the purposes of the Electronic Transactions Act 2002.
>> >>
>> >>
>> >> On Sun, Mar 28, 2021 at 3:49 PM Christopher Allen
>> >> <ChristopherA@lifewithalacrity.com
>> >> <mailto:ChristopherA@lifewithalacrity.com>> wrote:
>> >>
>> >>     On Sat, Mar 27, 2021 at 7:22 PM Steve Capell
>> >>     <steve.capell@gmail.com <mailto:steve.capell@gmail.com>> wrote:
>> >>
>> >>         The Singapore government https://www.openattestation.com/ does
>> >>         this already . Version 3 is W3C VC data model compliant
>> >>
>> >>         Each element is hashed (with salt I think) and then the hash
>> >>         of the hashed is the document hash that is notarised
>> >>
>> >>         The main rationale is selective redaction (because the root
>> >>         hash is unchanged when some clear text is hidden). But I
>> >>         suppose it simplifies canonicalisation too...
>> >>
>> >>
>> >>     I’m a big fan of this approach, a form of redaction distinct from
>> >>     zk forms of selective disclosure.
>> >>
>> >>     There was an attempt to spec one here in the CCG three-four years
>> >>     ago, but it died on the vine.
>> >>
>> >>     I’d be interested is seeing this spec & implementation. Any links?
>> >>
>> >>     — Christopher Allen [via iPhone]
>> >>
>> >>
>> >> This communication, including any attachments, is confidential. If you
>> are not the intended recipient, you should not read it - please contact me
>> immediately, destroy it, and do not copy or use any part of this
>> communication or disclose anything about it. Thank you. Please note that
>> this communication does not designate an information system for the
>> purposes of the Electronic Transactions Act 2002.
>>
>>
>> --
>> Dave Longley
>> CTO
>> Digital Bazaar, Inc.
>>
>>
>
> --
> *ORIE STEELE*
> Chief Technical Officer
> www.transmute.industries
>
> <https://www.transmute.industries>
>


-- 
-Brent Shambaugh

GitHub: https://github.com/bshambaugh
Website: http://bshambaugh.org/
LinkedIN: https://www.linkedin.com/in/brent-shambaugh-9b91259
Skype: brent.shambaugh
Twitter: https://twitter.com/Brent_Shambaugh
WebID: http://bshambaugh.org/foaf.rdf#me

Received on Monday, 29 March 2021 18:57:06 UTC