Re: Demonstration of Support for NIST-Compliant Selective Disclosure for Data Integrity Cryptosuites in VCWG

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 27 Aug 2023 17:27:33 -0400
To: Oliver Terbu <oliver.terbu@spruceid.com>
Cc: W3C Verifiable Credentials Working Group <public-vc-wg@w3.org>
Message-ID: <CAMBN2CSBonfM+a67UQpKs2KAq9FLHzm6dBsKqHBi7CQfdtV5yg@mail.gmail.com>

> On Sun 13. Aug 2023 at 06:36, Oliver Terbu <oliver.terbu@spruceid.com> wrote:

Hey Oliver, apologies for the delayed response, answers to your
questions below...

>> Re ephemeral keys on ecdsa-sd. this means an issuer cannot use a boring old HSM to protect the signatures, correct? It wouldn’t be viable imo. To many keys (at least a new keypair for each VC) and also the ephemeral nature doesn’t seem to fit if one uses an HSM.
>>
> I guess if one can throw away the esk then HSM could be viable.

Yes, exactly. The ephemeral key is generated when the VC is issued,
signs each selectively-disclosed statement, and is then thrown away.
The public part of the ephemeral key is then signed over using an HSM
key. This means that ecdsa-sd does support HSM-backed keys.

> 1) If I understood correctly, the ephemeral public key has to be revealed in every transaction (derived proof), correct? If this is the case, wouldn’t it be better (required) to generate a new key pair per statement instead, to avoid correlation based on that key.

Yes, the ephemeral public key is revealed in each derived proof via
the mandatory disclosure fields. Generating a new key pair per
statement could be done, but doing so to avoid correlation isn't a
goal since the NIST-compliant keys used do not allow one to avoid
correlation. The issuer's key is correlatable, the ECDSA signature is
correlatable, and so on... one of the drawbacks of NIST-compliant
schemes is that they're correlatable.

We did explore multi-issuance of the mandatory and selectively
disclosable fields, so you could issue 10s to 100s of mandatory and
selectively disclosable fields using different keys/nonces/signatures
... and it's true that it avoids correlation if a wallet carefully
tracks usage of each field. That said, stateful cryptographic schemes
(which provide a case study in how this turns out in practice) do not
have a good track record. That is, misimplementations have resulted in
compromises in the past. Now, the stakes aren't as high in many cases,
but I don't think we'd be able to get to something even closely
approaching unlinkable signatures.

In short, the added complexity felt like it wouldn't actually solve
the problem. ecdsa-sd is correlatable in the same way that SD-JWT and
mDL are correlatable. If one wants unlinkable signatures, BBS+ seems
to be the best bet at this point in time and ecdsa-sd has now created
the primitives that would be needed for BBS to be provided as a
layered signature for use cases that need it:

https://www.w3.org/TR/vc-data-integrity/#agility-and-layering

> 2) If I understood correctly, the algorithm uses a master issuer key to sign an ephemeral public key where each statement is signed by the corresponding ephemeral secret key. Where do you see the advantage of this approach in general compared to issuing individual credentials for each of these statements? To me it sounds similar.

Yes, you're right, there are similarities there. Given that the data
model for VCs is a graph-based data model, and you can merge all of
these graphs together, you could architect a solution like you say
above without any changes to the VC Data Model. The downside, of
course, is that you have to manage all of these credentials in the
digital wallet -- it places a burden, some would argue an unreasonable
burden, on the digital wallet ecosystem. The counter-point to that is
that the digital wallet ecosystem might be absorbing that burden
anyway, given the increasing number of digital credential formats...
and then the counter-counter-point is: don't make the problem worse!
:P

The advantage to doing things the way ecdsa-sd does them is that you
can bundle all of this up in a single credential. In fact, if you use
Data Integrity to secure the credential, you can layer a standard
ECDSA proof, an ecdsa-sd proof, a BBS proof, and a (potential future)
post-quantum proof all onto a single VC, which would result in a
single object for a digital wallet to manage instead of tens to
hundreds of objects to manage.

> 3) Can you elaborate why, where and how the hmac comes into play? I understood it’s more or less to have pseudo random node identifiers but I’m not sure why it is needed to have deterministic random values for that in the first place instead of for example fully random values that don’t require an hmac.

Yes, and for those not familiar with the algorithm, Oliver is talking
about step 2 here:

https://w3c.github.io/vc-di-ecdsa/#base-proof-transformation-ecdsa-sd-2023

and this function:

https://w3c.github.io/vc-di-ecdsa/#createhmacidlabelmapfunction

Since VCs use a graph-based data model, each node in the graph either
has an explicit identifier (a URL) or an implicit identifier (a "blank
node" identifier). URLs leak information that is internal to the
document, and external to the document, because they're universal
identifiers (so, if you use them in your selectively-disclosable VC,
there's nothing we can do to help you wrt. information leakage). Blank
node identifiers can only leak information that is internal to the
document (since the identifiers are always local to the document).
Specifically, blank node identifiers can leak information about how
many nodes are in the graph that is being selectively disclosed. For
example, if you have a list of 100 items in your VC, and you want to
make each one of those items selectively disclosable, the blank node
identifiers for those 100 items could leak how many items are in the
list.

In order to mitigate that leakage of information, an HMAC is used to
deterministically generate pseudo-random identifiers for blank nodes
so the verifier can't tell how many nodes are in the graph that's
being selectively disclosed.

So, why use an HMAC instead of just using completely random
identifiers? If we used completely random identifiers, we'd have to
store a mapping of all those identifiers in the base proof. Note: The
entity that uses the HMAC key is the holder, when doing a selective
disclosure... the HMAC key is not disclosed to the verifier.

Presuming we have 3 blank nodes in the VC graph:

A, B, C

We'd have to sign over the mapping of blank node IDs to random IDs:

[A => Ar, B => Br, C => Cr]

Presuming that we wanted to use cryptographically safe identifiers --
say 32 bytes in size, the storage formula would be the number of blank
nodes multiplied by 32 bytes: B * 32 ... which can add up for larger
graph sizes.

Compare that with just storing a single 32 byte HMAC key. If you had a
list of 100 items that you wanted to selectively disclose the
difference would be 3,200 additional bytes added to the signature (no
HMAC) vs. 32 bytes (with HMAC). We didn't want the signature size to
grow in size like that if it could be mitigated, so we chose to use an
HMAC to cap the mapping function to 32 bytes in size.

Did that answer your questions, Oliver?

-- manu

--
Manu Sporny - https://www.linkedin.com/in/manusporny/
Founder/CEO - Digital Bazaar, Inc.
https://www.digitalbazaar.com/

Received on Sunday, 27 August 2023 21:28:15 UTC