Re: Demonstration of Support for NIST-Compliant Selective Disclosure for Data Integrity Cryptosuites in VCWG from Personal Sam Smith on 2023-08-14 (public-vc-wg@w3.org from August 2023)

From: Personal Sam Smith <sam@samuelsmith.org>
Date: Mon, 14 Aug 2023 12:05:23 -0600
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Christopher Allen <christophera@lifewithalacrity.com>, W3C Verifiable Credentials Working Group <public-vc-wg@w3.org>
Message-Id: <88FBDADA-EE3D-49E0-ABC1-A1620110227A@samuelsmith.org>
Essentially then a signature on the SAID (content addressable identifier that is derived from a digest of the ACDC) each newly appended ACDC is both an integrity and authenticiity proof on the subgraph formed by appending the graph fragment. These graph fragments are DAGs and are analogous to branches in  code versioning systems like git. An appended ACDC can then form a merge of subgraphs by including the origin nodes of each of the subgraphs in the appended ACDC's edges. None of the proofs of the pre-existing subgraphs need to be recomputed to verify the new sub-graph formed by the merge.

IMHO this is a very powerful easily understandable, easily scalable, easy to reason about approach to building verifiable data-sets, i.e. as append only hash-dags.  


> On Aug 14, 2023, at 11:48, Personal Sam Smith <sam@samuelsmith.org> wrote:
> 
> The graph of ACDCs forms an append-only distributed verifiable data structure. It is a type of hash graph, not a Merkle tree, but a a verifiable hash graph, nonetheless. Because the edges may be blinded and the nodes themselves may be also blinded and the attributes of the nodes besides their edges may be blinded, a verifier can verify the cryptographic integrit of the graph and verify signatures on each node of the graph separately without the need to unblind the graphed attributes.
> 
> As an append only data structure, a given verifier can maintain already verfied graph fragments without needed to reverify previously verified portions of the graph (sub-graphs) whenever new fragments are appended.
> 
> 
> 
> 
> 
> Further elaborating:
> 
> Because ACDCs support chaining (treeing) of other ACDCs, there is little need for selectively disclosable arrays or lists except to support legacy paper credential data formats that co-mingle forensic information with the actual actionable attribute. This legacy approach tighly couples attributes that do not need to be tightly coupled and one fix for that defect is to employ a selective disclosure mechanism. 
> 
> For example, a driver's license is essential an entitlement to drive, the only tightly coupled information needed to proof such an entitlement is an identifier, an authentication factor for that identifier, and a verifiable attestation by the Drivers License Division to that identifier that the authenticated controller of the identifier is entitled to drive.  Any other information, birthday, name, address etc is only provided in the event that the driver violates a traffic law in order that the law enforcement officer can issue a ticket because current law requires that the ticket identify the name and address of the violator. Thus the forensic information is not ever needed except in the event a law enforcement officer requires it. 
> 
> 
> In a modern from scratch design, non-tightly coupled attributes such as forensic attributes are separated out into their own ACDCs, and then the whole set can be combined together in a graph of ACDCs. The edges in the graph can be blinded so that disclosure of a given attribute need not leak anything in its edges.  This allows what we are calling provisional authenticity or provisional entitlements where disclosure of the ACDCs on the far side of edges is only used when a given contingency requires such as forensic information needed to issue a violation ticket by a law enforcement officer.
> 
> Such blinded graphed representation of information in most cases obviates the need for selective disclosable arrays. 
> 
> With ACDCs we can granularly represent attributes in a secure privacy protective verifiable graph. This graph can be extended to include information from multiple issuers so that we can represent delegable entitlements or data supply chains all using the same  tooling for verification. Each source of a node in the graph is separately verifiable and the graph itself can be transmitted over the wire in fragments where each fragment is independently verifiable before being assembled with other fragments to perform business logic on the graph.  
> 
> This avoids the NP hard problem of verifying graphs after assemblage and proofs on an assembled graph.
> 
> 
>> On Aug 12, 2023, at 10:14, Manu Sporny <msporny@digitalbazaar.com> wrote:
>> 
>> On Sat, Jul 29, 2023 at 4:42 PM Samuel Smith <sam@prosapien.com> wrote:
>>> The way it works in ACDC is that the list of selectively disclosed attributes are part of an aggregated Hash, that is the hash of a list of blinded hashes. The only thing that is signed is the hash of the list of blinded hashes. Each blinded hash is of a field map using the SAID protocol to generate the self referential hash. But the structure of the field map itself is not leaked. So the length of the list and the structure of individual elements of the list is not disclosed or signed only the blinded aggregate.  So no information is leaked at this point.
>> 
>> Yes, that sounds correct to me (that nothing is leaked at this point).
>> 
>>> The spec also allows an alternative form in which the aggregate is the merkle tree root of the merkle tree of the blinded hashes. Once again the structure of the data inside each blinded hash is not disclosed nor is the size of the merkle tree exposed at this stage. The signature is on the aggregate hash. This is not the same as the w3c mechanism but it would be unfair to say that this approach is leaking information.
>> 
>> Right, and I didn't mean to suggest that information was leaked -- I
>> don't know ACDC at enough depth to understand where the information
>> leakage boundaries are. I know that SD-JWT (at least, in one of it's
>> iterations, things might have changed since then) leaks information on
>> list sizes based on it's design... that was the format I was alluding
>> to when I mentioned that some selective disclosure formats leak
>> information in ways that ecdsa-sd does not.
>> 
>>> ACDCs also have a different selective disclosure mechanism which are labeled nested blinded hashes of field maps. The aggregate(s) at any level may have a label. The label itself may leak information about what has been hashed but the structure of what has been hashed is not disclosed or leaked. This I believe is closer to Gordian elision,
>>> The two mechanisms (unlabeled aggregate of list of blinded hashes or labeled aggregate of nested blinded hashes of field maps can be combined depending on the use case.
>> 
>> What does ACDC do when selectively disclosing an item in a list? Is
>> the size of the list disclosed, or is that kept secret in some way?
> 
> In its simplest form, the list is presented as an ordered array of blinded hashes. So at the time of presentation the verifier
> knows the length of the list and the offset a given blinded hash is in the list. But the semantics of what a given offset represrtents besides the selectively disclosable one  is not leaked to the verifier. The ACDC includes contractually proteted disclosure which put the verifier under contract to keep both the length of the list and the offset of a disclosed item confidential. The verifier is not presented this information until after contractual protection which may be verifiably committed too by both the presenter and verifier using merely the aggregate of the list which aggregate leaks no information about the length of the list or the offset of a given attribute in the list.  The json schema used for the list is an anyOf composition operator which means that the elements in the list per the schema may be presented in any order. This means that the schema  does not leak information about the number of elements or the order prior to final selective disclosure.  This feature of the anyOf composition operator in json schema largely addresses the problem of sematic correlatability in a different way than json-ld RDF does. The schema is unordered so one can commit to the semantics of any element in the list without disclosing where in the list or how long is the list.  Negotiation over what to selectively disclose can all happen without leaking, and then once the negotiation is complete then and only then does the presentation provide the list of blinded hashes.  
> 
> An issuer can make this list more difficult to correlate by adding dummy entries in the list. The set of schema elements in the anyOf composition then does not correlate to the number entries in the list. 
> 
> In the next more complicated form, the list is encoded in a Merkle tree. This may be a sparse Merkle tree. The aggregate is the Merkle root. Disclosure of a leaf node via an inclusion proof in a sparse Merkle tree does not leak any information about the other leaf nodes in that sparse Merkle Tree. Because a spare Merkle Tree is a cryptographic accumulator an inclusion proof does not leak any information about the other entries (leaf) nodes in the tree.
> The same anyOf json schema composition operator works the same for a cryptographic accumulator as described above for a cryptographic aggregate (hash of a concatenated list of blinded hashes).
> 
> 
> Obviously repeated presentation to the same verifier, where each presentation disclosed a different element in the list would then allow the verifier to correlate across presentations This is not a zero-knowledge proof.
> 
> However ACDCs have mechanisms to limit correlatability across presentations with what we call bulk issued ACDCs. Each ACDC member of the builk issued set is itself blinded The presenter can then isolate presentations across contexts. Therefore when a presenter does not want a given verifier to correlate multiple presentations separated in time and space then the presenter uses a different ACDC from the bulk issued set.  For bulk issuance the issuer and holder exchange the seed of a HDKeychain. The seek is used to generate the blinds for each member of the bulk set.  The overhead of a bulk issued credential is trivial because the issuer only exchanges an ACDC template and the seed. Every presentation can be auto generated on the fly.  
> 
> More details are in the ACDC specification.
> 
> Our goa in ACDCl was to provide different levels of correlation control using what we call graduated disclosure mechanism of which selective disclosure is but one. But all using simple highly adoptable crypto (nothing more than digital signatures, digests, and when needed Merkle trees as part of the baseline spec. This does not preclude the use of ZKPs as an optional mechanism. But for the vast majority of use cases we believe the graduated disclosure mechanisms in ACDCs provide a Goldilocks trade-off between correlatability and adoptability.
> 
> 
> 
> 
>> 
>> I ask because I'm curious to hear if collections of information
>> associated with a property are treated as ordered sets or unordered
>> sets? We know that the information leakage problem is fairly easy to
>> solve when you're dealing w/ unordered sets (which is typical in VCs
>> using JSON-LD)... but becomes much harder if your data structure
>> treats all lists as ordered (which is typical when using JSON arrays).
>> 
>> Christopher, same question wrt. Gordian Envelopes -- how did you
>> approach the information leakage issue when disclosing a single item
>> out of an ordered set / list / array?
>> 
>> -- manu
>> 
>> -- 
>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>> Founder/CEO - Digital Bazaar, Inc.
>> https://www.digitalbazaar.com/
Received on Monday, 14 August 2023 18:05:41 UTC