Re: Demonstration of Support for NIST-Compliant Selective Disclosure for Data Integrity Cryptosuites in VCWG from Oliver Terbu on 2023-08-28 (public-vc-wg@w3.org from August 2023)

From: Oliver Terbu <oliver.terbu@spruceid.com>
Date: Mon, 28 Aug 2023 13:58:59 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: W3C Verifiable Credentials Working Group <public-vc-wg@w3.org>
Message-ID: <CAP7TzjBsipK0bt-Aovxi8Zj7EFjteFeOkmTYE0GKXY53d_abUQ@mail.gmail.com>
Manu, I appreciate your response. It has clarified all the questions I had.
So, just to confirm my understanding, the HMAC serves solely as a CSPRNG,
and there's no requirement to independently verify the actual HMAC.
Additionally, I completely agree that introducing extra ephemeral keypairs
would indeed be complex in this context.

Here's an interesting aspect to consider. In certain legal jurisdictions,
reaching the highest assurance level requires the protection of the key
used for signing statements within an HSM as well. For instance, to achieve
eIDAS high, as outlined in the document
https://ec.europa.eu/digital-building-blocks/wikis/display/DIGITAL/eIDAS+Levels+of+Assurance,
protecting the esk in hardware is mandatory. This aspect adds an extra
layer to the discussion about the ephemeral public key in the ecdsa-sd
scenario. Despite the fact that the public key is signed by an HSM, it
lacks the protection of an HSM.

Agreeing with your point about unlinkable signatures being optimal for
privacy, I'd like to add that it's crucial to consider the absence of
robust hardware support for BBS+ credentials. As long as this deficiency
exists, those credentials remain susceptible to cloning, which, in my
opinion, undermines achieving high security levels. Merging BBS+ with ECDSA
would compromise its unlinkability property, which defeats its intended
purpose. A potential solution might involve the use of linked secrets or
signature IDs. However, it's worth noting that these alternatives might
also lack hardware-based anti-cloning protection. For credentials to have a
high assurance level, effective measures against impersonation must be in
place to uphold security, as neglecting this aspect also jeopardizes
privacy.



On Sun, Aug 27, 2023 at 11:28 PM Manu Sporny <msporny@digitalbazaar.com>
wrote:

> > On Sun 13. Aug 2023 at 06:36, Oliver Terbu <oliver.terbu@spruceid.com>
> wrote:
>
> Hey Oliver, apologies for the delayed response, answers to your
> questions below...
>
> >> Re ephemeral keys on ecdsa-sd. this means an issuer cannot use a boring
> old HSM to protect the signatures, correct? It wouldn’t be viable imo. To
> many keys (at least a new keypair for each VC) and also the ephemeral
> nature doesn’t seem to fit if one uses an HSM.
> >>
> > I guess if one can throw away the esk then HSM could be viable.
>
> Yes, exactly. The ephemeral key is generated when the VC is issued,
> signs each selectively-disclosed statement, and is then thrown away.
> The public part of the ephemeral key is then signed over using an HSM
> key. This means that ecdsa-sd does support HSM-backed keys.
>
> > 1) If I understood correctly, the ephemeral public key has to be
> revealed in every transaction (derived proof), correct? If this is the
> case, wouldn’t it be better (required) to generate a new key pair per
> statement instead, to avoid correlation based on that key.
>
> Yes, the ephemeral public key is revealed in each derived proof via
> the mandatory disclosure fields. Generating a new key pair per
> statement could be done, but doing so to avoid correlation isn't a
> goal since the NIST-compliant keys used do not allow one to avoid
> correlation. The issuer's key is correlatable, the ECDSA signature is
> correlatable, and so on... one of the drawbacks of NIST-compliant
> schemes is that they're correlatable.
>
> We did explore multi-issuance of the mandatory and selectively
> disclosable fields, so you could issue 10s to 100s of mandatory and
> selectively disclosable fields using different keys/nonces/signatures
> ... and it's true that it avoids correlation if a wallet carefully
> tracks usage of each field. That said, stateful cryptographic schemes
> (which provide a case study in how this turns out in practice) do not
> have a good track record. That is, misimplementations have resulted in
> compromises in the past. Now, the stakes aren't as high in many cases,
> but I don't think we'd be able to get to something even closely
> approaching unlinkable signatures.
>
> In short, the added complexity felt like it wouldn't actually solve
> the problem. ecdsa-sd is correlatable in the same way that SD-JWT and
> mDL are correlatable. If one wants unlinkable signatures, BBS+ seems
> to be the best bet at this point in time and ecdsa-sd has now created
> the primitives that would be needed for BBS to be provided as a
> layered signature for use cases that need it:
>
> https://www.w3.org/TR/vc-data-integrity/#agility-and-layering
>
> > 2) If I understood correctly, the algorithm uses a master issuer key to
> sign an ephemeral public key where each statement is signed by the
> corresponding ephemeral secret key. Where do you see the advantage of this
> approach in general compared to issuing individual credentials for each of
> these statements? To me it sounds similar.
>
> Yes, you're right, there are similarities there. Given that the data
> model for VCs is a graph-based data model, and you can merge all of
> these graphs together, you could architect a solution like you say
> above without any changes to the VC Data Model. The downside, of
> course, is that you have to manage all of these credentials in the
> digital wallet -- it places a burden, some would argue an unreasonable
> burden, on the digital wallet ecosystem. The counter-point to that is
> that the digital wallet ecosystem might be absorbing that burden
> anyway, given the increasing number of digital credential formats...
> and then the counter-counter-point is: don't make the problem worse!
> :P
>
> The advantage to doing things the way ecdsa-sd does them is that you
> can bundle all of this up in a single credential. In fact, if you use
> Data Integrity to secure the credential, you can layer a standard
> ECDSA proof, an ecdsa-sd proof, a BBS proof, and a (potential future)
> post-quantum proof all onto a single VC, which would result in a
> single object for a digital wallet to manage instead of tens to
> hundreds of objects to manage.
>
> > 3) Can you elaborate why, where and how the hmac comes into play? I
> understood it’s more or less to have pseudo random node identifiers but I’m
> not sure why it is needed to have deterministic random values for that in
> the first place instead of for example fully random values that don’t
> require an hmac.
>
> Yes, and for those not familiar with the algorithm, Oliver is talking
> about step 2 here:
>
> https://w3c.github.io/vc-di-ecdsa/#base-proof-transformation-ecdsa-sd-2023
>
> and this function:
>
> https://w3c.github.io/vc-di-ecdsa/#createhmacidlabelmapfunction
>
> Since VCs use a graph-based data model, each node in the graph either
> has an explicit identifier (a URL) or an implicit identifier (a "blank
> node" identifier). URLs leak information  that is internal to the
> document, and external to the document, because they're universal
> identifiers (so, if you use them in your selectively-disclosable VC,
> there's nothing we can do to help you wrt. information leakage). Blank
> node identifiers can only leak information that is internal to the
> document (since the identifiers are always local to the document).
> Specifically, blank node identifiers can leak information about how
> many nodes are in the graph that is being selectively disclosed. For
> example, if you have a list of 100 items in your VC, and you want to
> make each one of those items selectively disclosable, the blank node
> identifiers for those 100 items could leak how many items are in the
> list.
>
> In order to mitigate that leakage of information, an HMAC is used to
> deterministically generate pseudo-random identifiers for blank nodes
> so the verifier can't tell how many nodes are in the graph that's
> being selectively disclosed.
>
> So, why use an HMAC instead of just using completely random
> identifiers? If we used completely random identifiers, we'd have to
> store a mapping of all those identifiers in the base proof. Note: The
> entity that uses the HMAC key is the holder, when doing a selective
> disclosure... the HMAC key is not disclosed to the verifier.
>
> Presuming we have 3 blank nodes in the VC graph:
>
> A, B, C
>
> We'd have to sign over the mapping of blank node IDs to random IDs:
>
> [A => Ar, B => Br, C => Cr]
>
> Presuming that we wanted to use cryptographically safe identifiers --
> say 32 bytes in size, the storage formula would be the number of blank
> nodes multiplied by 32 bytes: B * 32 ... which can add up for larger
> graph sizes.
>
> Compare that with just storing a single 32 byte HMAC key. If you had a
> list of 100 items that you wanted to selectively disclose the
> difference would be 3,200 additional bytes added to the signature (no
> HMAC) vs. 32 bytes (with HMAC). We didn't want the signature size to
> grow in size like that if it could be mitigated, so we chose to use an
> HMAC to cap the mapping function to 32 bytes in size.
>
> Did that answer your questions, Oliver?
>
> -- manu
>
> --
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> https://www.digitalbazaar.com/
>
Received on Monday, 28 August 2023 11:59:17 UTC