Re: Selective Disclosure for W3C Data Integrity from Steve Capell on 2023-05-30 (public-credentials@w3.org from May 2023)

From: Steve Capell <steve.capell@gmail.com>
Date: Wed, 31 May 2023 02:03:27 +1000
To: Dave Longley <dlongley@digitalbazaar.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, "John, Anil" <anil.john@hq.dhs.gov>, W3C Credentials CG <public-credentials@w3.org>, Richard Spellman <richard.spellman@gosource.com.au>, Sin LOH <LOH_Sin_Yong@imda.gov.sg>, "Ren KAY (IMDA)" <KAY_Ren_Yuh@imda.gov.sg>
Message-Id: <2B178AAE-3FAB-4938-B1D5-3BADCF87E533@gmail.com>
Thanks Dave & Manu

Looks like valuable work.  

I note that you chose not to proceed with the salted hash approach - so I just wanted to point out that the salted hash approach is in production use over here in the Asia region and is based on Singapore government’s open attestation protocol.  I’d call it “selective redaction” rather than selective disclosure - and I’d like to suggest that it can and should live alongside your selective disclosure specification because it serves a different use case :
- removing a small number of commercially sensitive fields from otherwise quite large trade documents.  It’s much more efficient for this 
- allowing any holder who is usually not the subject to redact.  This is important because it’s usually a party further downstream in the supply chain that wants to redact a bearer-token style credential received from upstream.  

I believe that Singapore govt is organising a call with W3C on this 

Steven Capell
Mob: 0410 437854

> On 31 May 2023, at 1:50 am, Dave Longley <dlongley@digitalbazaar.com> wrote:
> 
> On Tue, May 30, 2023 at 8:31 AM Manu Sporny <msporny@digitalbazaar.com> wrote:
>> 
>>> On Tue, May 30, 2023 at 8:09 AM John, Anil <anil.john@hq.dhs.gov> wrote:
>>> It is good to see this gap filled and I appreciate that there will now exist implementation approaches both for the near term and with a road map for the future, whichever credential proof formats one chooses to use to protect your credential.
>> 
>> Yes, what we're going for here is to ensure that whatever formats one
>> chooses (Data Integrity or JOSE), that there exist selective
>> disclosure solutions for both NIST-compliant cryptography and for
>> future-facing unlinkable cryptography (BBS), even if it will be a
>> while before NIST officially recognizes BBS.
>> 
>>> Looking forward to learning more of the details.
>> 
>> Dave Longley is planning to share some of the thinking that went into
>> the Data Integrity Selective Disclosure design and implementation at
>> some point this week, so I expect that will help folks understand the
>> benefits and drawbacks of the approach.
> 
> Yes, the primary reason we developed ECDSA-SD was because we saw an
> important gap to fill in the Data Integrity space: A need for a
> selective disclosure mechanism that uses today's NIST-approved crypto.
> Therefore, our design focused on that near term problem to create
> something of value for right now. Whatever we or others can bring from
> our approach into the future is of additional benefit.
> 
> As for the future, we did minimally have a goal of making sure our
> design would fit in as a Data Integrity proof, building on top of the
> foundation created by W3C Working Groups (RCH and VCWG). This approach
> can help reduce integration costs and allow for later replacement by
> any improved selective disclosure Data Integrity cryptosuites. There
> is no expectation that ECDSA-SD will last forever or that it can't be
> replaced by something more future-facing when it arrives. There are a
> lot of things we use today that we will have to replace in the future
> – and that does little to reduce their utility now.
> 
> Something else that we wanted to do – and successfully started – was
> develop a Data Integrity selective disclosure primitives library that
> could be reused across Data Integrity selective disclosure
> cryptosuites. Part of this was informed by our efforts to support
> DI-BBS. For those interested in the primitives work, you can take a
> look at the open source implementation we built here:
> 
> https://github.com/digitalbazaar/di-sd-primitives
> 
> So, as mentioned, the primary selling point of ECDSA-SD is that it's a
> Data Integrity cryptosuite that uses NIST-approved cryptography. This
> is the main gap it was created to fill. As making trade offs in
> designs are always inevitable, I wanted to share a bit of our thinking
> below on the choices we made with ECDSA-SD:
> 
> Back in 2016, we proposed a salted hash / HMAC solution as a
> "redaction" selective disclosure mechanism to be used with Linked Data
> / Data Integrity Proofs. This didn't get uptake back then so we didn't
> continue development with it. But given that more of the VC ecosystem
> has found its roots now, we considered picking that work back up and
> carrying it forward. We also looked into using cryptographic
> accumulators and some options such as Merkle Trees -- but saw others
> researching those areas, so we defer to them. One of the other
> problems we wanted to avoid was inadvertently reintroducing
> complexities that did not result in a simple combination of existing
> NIST-approved crypto primitives, spoiling achievement of our main
> goal.
> 
> In the end, what drove our chosen design was noticing that selective
> disclosure seems to be most desirable and / or useful when less
> information is disclosed from a particular VC. The more information
> you disclose, the more likely you are to experience other
> correlational risk -- and, the more efficient it might be to just
> disclose an entire VC. Of course, this is somewhat intuitive:
> disclosing an entire VC defeats the point of selective disclosure and
> it can be done much more efficiently by using a single, simple ECDSA
> proof.
> 
> Given the above, it became clear that a basic combination of
> NIST-approved primitives could be applied with modern and acceptable
> cryptography to achieve the goals. By simply signing individual
> messages with a NIST-approved primitive, we could elide the signatures
> associated with any statements not disclosed. We chose to do this
> rather than take the salted hash approach. Some work needed to be done
> to ensure statements couldn't be recombined and so that signatures
> could be computed locally and efficiently -- so we made each VC use
> its own local cryptographic key for this. This is similar to the
> approach used in key agreement cryptography where a new ephemeral key
> is used for each encryption.
> 
> An additional benefit to the signature-per-message approach was that
> side channel information could also be elided; there's no need to send
> the total number of statements nor their order to the verifier with
> this approach. It helps reduce data leakage in what is disclosed. In
> fact, for the elided information, not even a one-way encryption of it
> needs to be exposed, even if the chance of some future attack against
> it is thought to be infeasible today. These were other advantages we
> noticed over using the salted hash approach.
> 
> So, what were some of the possible drawbacks?
> 
> Computing multiple signatures per VC instead of just one requires
> additional computation (or time), for one. In the past, using a single
> signature for each statement could have been seen as time or computing
> power prohibitive. But we found that there's been a lot of progress on
> that front over the past decade or so. To that point, we believe that
> the right question is "how long is too long to sign or verify a VC?"
> Our expectation is that ECDSA-SD won't run beyond that number for a
> considerable set (either most or all) of the use cases.
> 
> Next, transmitting multiple signatures takes more space than just
> sending one. Here, our focus was on minimizing the information that
> must be included in a disclosure or derived proof, not a base proof.
> The base proof is the proof that the issuer creates and gives to the
> holder. The holder does not share it, instead they derive a disclosure
> proof to give to the verifier.
> 
> We considered the size of the base proof to be less important since it
> will likely be transmitted just once at issuance and stored in a
> user's wallet on the cheap. Shrinking the disclosure proof reduces
> presentation transmission size to every verifier. This seemed more
> important to us; it could even help enable some transmission mediums
> that were previously harder to use with selective disclosure.
> 
> So, what of the multiple signature size overhead? During design, what
> we noticed about the overhead difference was that even if each
> individual signature is bigger by some factor, the disclosure proofs
> will still be smaller unless you reveal a lot of statements.
> 
> For example, if the overhead factor is 2, whenever you reveal less
> than half of the statements, your verification proof is smaller. Our
> expectation is that there could be a lot of use cases for that. In
> fact, the use cases commonly mentioned fit just that profile.
> 
> We think that a factor of 2 (plus or minus a little) is in the right
> ballpark when comparing a mechanism that would share hashes (~32
> bytes) for every unrevealed statement vs. ~64 bytes for every revealed
> signature. If we're right about common use, then this factor is more
> than low enough to see benefit. As an example of how we thought about
> this: if holders commonly provided three statements out of 35, then
> the factor could be as high as 10 to still get smaller verification
> proofs. If most of the selective disclosure use cases involve sharing
> less than half of the data in a VC, there is likely to be space
> savings in ECDSA-SD disclosure proofs. If you're sharing most of the
> data in a VC, perhaps consider another mechanism or -- a simpler
> non-selective disclosure proof will do anyway.
> 
> In summary, we built ECDSA-SD because we think it will be useful now.
> We built it primarily to fill a gap where there was no Data Integrity
> cryptosuite based on NIST-approved crypto -- and we made some design
> choices based on taking a step back and looking at how we expect
> people to use selective disclosure with VCs.
> 
> -- 
> 
> Dave Longley
> CTO
> Digital Bazaar, Inc.
>
Received on Tuesday, 30 May 2023 16:03:45 UTC