Re: Selective Disclosure for W3C Data Integrity from Dave Longley on 2023-05-30 (public-credentials@w3.org from May 2023)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Tue, 30 May 2023 11:46:46 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: "John, Anil" <anil.john@hq.dhs.gov>, W3C Credentials CG <public-credentials@w3.org>
Message-ID: <CAMJ8eMO400mRCKA5trqbKAE+dbgCVaY5K12mTizG1tdWemE6gw@mail.gmail.com>
On Tue, May 30, 2023 at 8:31 AM Manu Sporny <msporny@digitalbazaar.com> wrote:
>
> On Tue, May 30, 2023 at 8:09 AM John, Anil <anil.john@hq.dhs.gov> wrote:
> > It is good to see this gap filled and I appreciate that there will now exist implementation approaches both for the near term and with a road map for the future, whichever credential proof formats one chooses to use to protect your credential.
>
> Yes, what we're going for here is to ensure that whatever formats one
> chooses (Data Integrity or JOSE), that there exist selective
> disclosure solutions for both NIST-compliant cryptography and for
> future-facing unlinkable cryptography (BBS), even if it will be a
> while before NIST officially recognizes BBS.
>
> > Looking forward to learning more of the details.
>
> Dave Longley is planning to share some of the thinking that went into
> the Data Integrity Selective Disclosure design and implementation at
> some point this week, so I expect that will help folks understand the
> benefits and drawbacks of the approach.

Yes, the primary reason we developed ECDSA-SD was because we saw an
important gap to fill in the Data Integrity space: A need for a
selective disclosure mechanism that uses today's NIST-approved crypto.
Therefore, our design focused on that near term problem to create
something of value for right now. Whatever we or others can bring from
our approach into the future is of additional benefit.

As for the future, we did minimally have a goal of making sure our
design would fit in as a Data Integrity proof, building on top of the
foundation created by W3C Working Groups (RCH and VCWG). This approach
can help reduce integration costs and allow for later replacement by
any improved selective disclosure Data Integrity cryptosuites. There
is no expectation that ECDSA-SD will last forever or that it can't be
replaced by something more future-facing when it arrives. There are a
lot of things we use today that we will have to replace in the future
– and that does little to reduce their utility now.

Something else that we wanted to do – and successfully started – was
develop a Data Integrity selective disclosure primitives library that
could be reused across Data Integrity selective disclosure
cryptosuites. Part of this was informed by our efforts to support
DI-BBS. For those interested in the primitives work, you can take a
look at the open source implementation we built here:

https://github.com/digitalbazaar/di-sd-primitives

So, as mentioned, the primary selling point of ECDSA-SD is that it's a
Data Integrity cryptosuite that uses NIST-approved cryptography. This
is the main gap it was created to fill. As making trade offs in
designs are always inevitable, I wanted to share a bit of our thinking
below on the choices we made with ECDSA-SD:

Back in 2016, we proposed a salted hash / HMAC solution as a
"redaction" selective disclosure mechanism to be used with Linked Data
/ Data Integrity Proofs. This didn't get uptake back then so we didn't
continue development with it. But given that more of the VC ecosystem
has found its roots now, we considered picking that work back up and
carrying it forward. We also looked into using cryptographic
accumulators and some options such as Merkle Trees -- but saw others
researching those areas, so we defer to them. One of the other
problems we wanted to avoid was inadvertently reintroducing
complexities that did not result in a simple combination of existing
NIST-approved crypto primitives, spoiling achievement of our main
goal.

In the end, what drove our chosen design was noticing that selective
disclosure seems to be most desirable and / or useful when less
information is disclosed from a particular VC. The more information
you disclose, the more likely you are to experience other
correlational risk -- and, the more efficient it might be to just
disclose an entire VC. Of course, this is somewhat intuitive:
disclosing an entire VC defeats the point of selective disclosure and
it can be done much more efficiently by using a single, simple ECDSA
proof.

Given the above, it became clear that a basic combination of
NIST-approved primitives could be applied with modern and acceptable
cryptography to achieve the goals. By simply signing individual
messages with a NIST-approved primitive, we could elide the signatures
associated with any statements not disclosed. We chose to do this
rather than take the salted hash approach. Some work needed to be done
to ensure statements couldn't be recombined and so that signatures
could be computed locally and efficiently -- so we made each VC use
its own local cryptographic key for this. This is similar to the
approach used in key agreement cryptography where a new ephemeral key
is used for each encryption.

An additional benefit to the signature-per-message approach was that
side channel information could also be elided; there's no need to send
the total number of statements nor their order to the verifier with
this approach. It helps reduce data leakage in what is disclosed. In
fact, for the elided information, not even a one-way encryption of it
needs to be exposed, even if the chance of some future attack against
it is thought to be infeasible today. These were other advantages we
noticed over using the salted hash approach.

So, what were some of the possible drawbacks?

Computing multiple signatures per VC instead of just one requires
additional computation (or time), for one. In the past, using a single
signature for each statement could have been seen as time or computing
power prohibitive. But we found that there's been a lot of progress on
that front over the past decade or so. To that point, we believe that
the right question is "how long is too long to sign or verify a VC?"
Our expectation is that ECDSA-SD won't run beyond that number for a
considerable set (either most or all) of the use cases.

Next, transmitting multiple signatures takes more space than just
sending one. Here, our focus was on minimizing the information that
must be included in a disclosure or derived proof, not a base proof.
The base proof is the proof that the issuer creates and gives to the
holder. The holder does not share it, instead they derive a disclosure
proof to give to the verifier.

We considered the size of the base proof to be less important since it
will likely be transmitted just once at issuance and stored in a
user's wallet on the cheap. Shrinking the disclosure proof reduces
presentation transmission size to every verifier. This seemed more
important to us; it could even help enable some transmission mediums
that were previously harder to use with selective disclosure.

So, what of the multiple signature size overhead? During design, what
we noticed about the overhead difference was that even if each
individual signature is bigger by some factor, the disclosure proofs
will still be smaller unless you reveal a lot of statements.

For example, if the overhead factor is 2, whenever you reveal less
than half of the statements, your verification proof is smaller. Our
expectation is that there could be a lot of use cases for that. In
fact, the use cases commonly mentioned fit just that profile.

We think that a factor of 2 (plus or minus a little) is in the right
ballpark when comparing a mechanism that would share hashes (~32
bytes) for every unrevealed statement vs. ~64 bytes for every revealed
signature. If we're right about common use, then this factor is more
than low enough to see benefit. As an example of how we thought about
this: if holders commonly provided three statements out of 35, then
the factor could be as high as 10 to still get smaller verification
proofs. If most of the selective disclosure use cases involve sharing
less than half of the data in a VC, there is likely to be space
savings in ECDSA-SD disclosure proofs. If you're sharing most of the
data in a VC, perhaps consider another mechanism or -- a simpler
non-selective disclosure proof will do anyway.

In summary, we built ECDSA-SD because we think it will be useful now.
We built it primarily to fill a gap where there was no Data Integrity
cryptosuite based on NIST-approved crypto -- and we made some design
choices based on taking a step back and looking at how we expect
people to use selective disclosure with VCs.

-- 

Dave Longley
CTO
Digital Bazaar, Inc.
Received on Tuesday, 30 May 2023 15:47:07 UTC