Re: Demonstration of Support for NIST-Compliant Selective Disclosure for Data Integrity Cryptosuites in VCWG from Personal Sam Smith on 2023-08-14 (public-vc-wg@w3.org from August 2023)

From: Personal Sam Smith <sam@samuelsmith.org>
Date: Mon, 14 Aug 2023 11:12:49 -0600
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Christopher Allen <christophera@lifewithalacrity.com>, W3C Verifiable Credentials Working Group <public-vc-wg@w3.org>
Message-Id: <B04C5B81-5C13-4D3A-BA53-4CCDA253FF57@samuelsmith.org>
> On Aug 12, 2023, at 10:14, Manu Sporny <msporny@digitalbazaar.com> wrote:
> 
> On Sat, Jul 29, 2023 at 4:42 PM Samuel Smith <sam@prosapien.com> wrote:
>> The way it works in ACDC is that the list of selectively disclosed attributes are part of an aggregated Hash, that is the hash of a list of blinded hashes. The only thing that is signed is the hash of the list of blinded hashes. Each blinded hash is of a field map using the SAID protocol to generate the self referential hash. But the structure of the field map itself is not leaked. So the length of the list and the structure of individual elements of the list is not disclosed or signed only the blinded aggregate.  So no information is leaked at this point.
> 
> Yes, that sounds correct to me (that nothing is leaked at this point).
> 
>> The spec also allows an alternative form in which the aggregate is the merkle tree root of the merkle tree of the blinded hashes. Once again the structure of the data inside each blinded hash is not disclosed nor is the size of the merkle tree exposed at this stage. The signature is on the aggregate hash. This is not the same as the w3c mechanism but it would be unfair to say that this approach is leaking information.
> 
> Right, and I didn't mean to suggest that information was leaked -- I
> don't know ACDC at enough depth to understand where the information
> leakage boundaries are. I know that SD-JWT (at least, in one of it's
> iterations, things might have changed since then) leaks information on
> list sizes based on it's design... that was the format I was alluding
> to when I mentioned that some selective disclosure formats leak
> information in ways that ecdsa-sd does not.
> 
>> ACDCs also have a different selective disclosure mechanism which are labeled nested blinded hashes of field maps. The aggregate(s) at any level may have a label. The label itself may leak information about what has been hashed but the structure of what has been hashed is not disclosed or leaked. This I believe is closer to Gordian elision,
>> The two mechanisms (unlabeled aggregate of list of blinded hashes or labeled aggregate of nested blinded hashes of field maps can be combined depending on the use case.
> 
> What does ACDC do when selectively disclosing an item in a list? Is
> the size of the list disclosed, or is that kept secret in some way?

In its simplest form, the list is presented as an ordered array of blinded hashes. So at the time of presentation the verifier
knows the length of the list and the offset a given blinded hash is in the list. But the semantics of what a given offset represrtents besides the selectively disclosable one  is not leaked to the verifier. The ACDC includes contractually proteted disclosure which put the verifier under contract to keep both the length of the list and the offset of a disclosed item confidential. The verifier is not presented this information until after contractual protection which may be verifiably committed too by both the presenter and verifier using merely the aggregate of the list which aggregate leaks no information about the length of the list or the offset of a given attribute in the list.  The json schema used for the list is an anyOf composition operator which means that the elements in the list per the schema may be presented in any order. This means that the schema  does not leak information about the number of elements or the order prior to final selective disclosure.  This feature of the anyOf composition operator in json schema largely addresses the problem of sematic correlatability in a different way than json-ld RDF does. The schema is unordered so one can commit to the semantics of any element in the list without disclosing where in the list or how long is the list.  Negotiation over what to selectively disclose can all happen without leaking, and then once the negotiation is complete then and only then does the presentation provide the list of blinded hashes.  

An issuer can make this list more difficult to correlate by adding dummy entries in the list. The set of schema elements in the anyOf composition then does not correlate to the number entries in the list. 

In the next more complicated form, the list is encoded in a Merkle tree. This may be a sparse Merkle tree. The aggregate is the Merkle root. Disclosure of a leaf node via an inclusion proof in a sparse Merkle tree does not leak any information about the other leaf nodes in that sparse Merkle Tree. Because a spare Merkle Tree is a cryptographic accumulator an inclusion proof does not leak any information about the other entries (leaf) nodes in the tree.
The same anyOf json schema composition operator works the same for a cryptographic accumulator as described above for a cryptographic aggregate (hash of a concatenated list of blinded hashes).


Obviously repeated presentation to the same verifier, where each presentation disclosed a different element in the list would then allow the verifier to correlate across presentations This is not a zero-knowledge proof.

However ACDCs have mechanisms to limit correlatability across presentations with what we call bulk issued ACDCs. Each ACDC member of the builk issued set is itself blinded The presenter can then isolate presentations across contexts. Therefore when a presenter does not want a given verifier to correlate multiple presentations separated in time and space then the presenter uses a different ACDC from the bulk issued set.  For bulk issuance the issuer and holder exchange the seed of a HDKeychain. The seek is used to generate the blinds for each member of the bulk set.  The overhead of a bulk issued credential is trivial because the issuer only exchanges an ACDC template and the seed. Every presentation can be auto generated on the fly.  

More details are in the ACDC specification.

Our goa in ACDCl was to provide different levels of correlation control using what we call graduated disclosure mechanism of which selective disclosure is but one. But all using simple highly adoptable crypto (nothing more than digital signatures, digests, and when needed Merkle trees as part of the baseline spec. This does not preclude the use of ZKPs as an optional mechanism. But for the vast majority of use cases we believe the graduated disclosure mechanisms in ACDCs provide a Goldilocks trade-off between correlatability and adoptability.




> 
> I ask because I'm curious to hear if collections of information
> associated with a property are treated as ordered sets or unordered
> sets? We know that the information leakage problem is fairly easy to
> solve when you're dealing w/ unordered sets (which is typical in VCs
> using JSON-LD)... but becomes much harder if your data structure
> treats all lists as ordered (which is typical when using JSON arrays).
> 
> Christopher, same question wrt. Gordian Envelopes -- how did you
> approach the information leakage issue when disclosing a single item
> out of an ordered set / list / array?
> 
> -- manu
> 
> -- 
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> https://www.digitalbazaar.com/
Received on Monday, 14 August 2023 17:13:07 UTC