Re: Selective Disclosure for W3C Data Integrity

Thanks Steve, let me know if my psuedo algorithm captures the essence of 
your approach and that the overheads seem right. Also the reason for 
salts seems different than the SD-JWT case (for unlinkability).

Steps for a straight forward Merkel Tree approach to Selective 
Disclosure based on /Open Attestation/. Parameters: hash algorithm for 
tree, signature algorithm, salt size and generation approach.

*Signed Document Creation*:

 1. Input is a JSON object
 2. “Flatten” the JSON object into a list of individual properties and
    values (the library they use to do this is reversible)
 3. For each (property, value) tuple from above add a different /salt/
    value. The purpose of this salt is to prevent inferring the
    (property, value) tuple from the hashed value in the case where
    these have limited range. For example may only take on a small set
    of values.
 4. The hash of each (property, value, salt) tuple is taken and put into
    a /sorted/ list.
 5. A hash of the /entire/ sorted list is taken. This is the value that
    is then signed with a signature algorithm.
 6. The list of triples (property, value, salt) is sent along with the
    signature. Note that salts expand the size of the information a bit.

*Selectively Disclosed Document Creation*:

 1. Input received list of triples (property, value, salt) and signature
 2. Create an empty list for “Obfuscated Data”.
 3. For each triple (property, value, salt) that is to be /elided/, (not
    disclosed) compute its hash and add that hash value to the
    “Obfuscated Data” list.
 4. Send only the disclosed triples, (property, value, salt), the
    “Obfuscated Data” list, and the signature to the verifier.

*Verifying Selectively Disclosed Document*:

 1. For each disclosed (property, value, salt) triple compute the hash
    and place it in a list.
 2. Add the hashes from the “Obfuscated Data” list to the above list.
 3. Sort the combined list from above. Take the hash of this sorted list.
 4. Verify the above hash against the signature using the verification
    algorithm.

Notes:

  * This approach has fairly low additional overhead. A Salt added for
    each (property, value) pair by the issuer and included with all
    revealed tuples. A hash value for each non-disclosed tuple.
  * Note that the concept of a tree is implicit here and not used to
    further advantage. However, the structure of the approach is very
    straight forward and can be applied in different settings.
  * The salts are needed primarily for confidentiality of the elided
    data. However, if a new set of salts is used for every time a
    signature is generated this can be used to prevent linking when we
    have /verifier/ to /verifier/ collusion.

On 6/5/2023 1:39 PM, steve capell wrote:

> There’s a bit on the salted hash approach on this page 
> https://www.openattestation.com/docs/docs-section/how-does-it-work/document-integrity. 
>  Written more from a developer user perspective than from a standards 
> specification perspective - although I believe the Singapore team are 
> writing it up as a specification.  Kay?  Is there a link for a draft 
> specification on this?
>
>> On 6 Jun 2023, at 3:39 am, Greg Bernstein 
>> <gregb@grotto-networking.com> wrote:
>>
>> I’ve seen the salted hash approach in SD-JWT to prevent “verifier to 
>> verifier” collusion (tracking) with fairly arbitrary signature 
>> algorithms. If we are just interested in ECDSA then we should be able 
>> to use the “random version of ECDSA” rather than the “Deterministic 
>> ECDSA” to achieve the same functionality without the need for a salt.
>>
>> Was just writing up a PR on “security considerations” for ECDSA 
>> Cryptosuite v2019 <https://github.com/w3c/vc-di-ecdsa> and while 
>> recommending Deterministic ECDSA left the option for random ECDSA.
>>
>> Is there a reference for the “salted hash tree” approach?
>>
>> Cheers
>>
>> Greg B. Grotto Networking <https://www.grotto-networking.com/>
>>
>>
>> On 6/3/2023 6:48 PM, Steve Capell wrote:
>>
>>
>>> Thanks Manu
>>>
>>> Happy to participate in these tests and calculations
>>>
>>> I can see how ecdsa-sd could be sufficient efficient (pending test results).  How would we address the requirement for any holder along the supply chain to redact? Can you see a way to blend the salted hash tree model with ecdsa-sd?
>>>
>>> I agree with Richard’s observation that when we stop trying to copy the paper then there’s potentially a lot less need for redaction - but I suspect we’re in for a longish transition period, particularly for supply chain documents like invoices, waybills, and conformity certificates
>>>
>>> Steven Capell
>>> Mob: 0410 437854
>>>
>>>> On 4 Jun 2023, at 1:41 am, Manu Sporny<msporny@digitalbazaar.com>  wrote:
>>>>
>>>> On Wed, May 31, 2023 at 4:48 AM Steve Capell<steve.capell@gmail.com>  wrote:
>>>>> Regarding the size / cost volumetrics I don’t have concrete metrics but I’ll say it’s not uncommon for trade documents like invoices and waybills to have dozens or even hundreds of lines.
>>>> The reason I asked is because it would be nice if we could run some
>>>> tests w/ ecdsa-sd and your supply chain use cases. Here are some
>>>> situations where Data Integrity for Selective Disclosure (ecsda-2023)
>>>> will work out well:
>>>>
>>>> * You have a large document with many claims (100+) that must be
>>>> mandatorily disclosed (these are all lumped into a single hash in
>>>> ecdsa-sd and so costs little), and only a few (1-30) that you want to
>>>> be selectively disclosed (and only a few of those are disclosed at a
>>>> time -- this costs about 66 bytes per revealed claim).
>>>>
>>>> * You have a small document with a handful of fields (1-30) that you
>>>> want to be selectively disclosed (and only a few of those, 1-5, are
>>>> disclosed at a time -- again, 66 bytes per revealed claim).
>>>>
>>>> For the Data Integrity for Selective Disclosure work, we are working
>>>> on a Google Sheet that allows you to input the total number of
>>>> statements, total number of mandatory disclosure claims, total number
>>>> of selective disclosure claims, total number of objects without
>>>> identifiers, and it will spit out the initial proof size, and then the
>>>> selective disclosure proof size (based on how much you're disclosing).
>>>> Having something like that for your merkle-based mechanism, SD-JWT,
>>>> and BBS would be useful to the community. We'd prefer if each
>>>> community provided the calculations, but if that doesn't happen, we
>>>> might just put something out there and see how well we did at
>>>> analysing the cryptographic variables. We're happy to be told we're
>>>> wrong in order to get to more accurate numbers for the ecosystem to
>>>> compare/contrast.
>>>>
>>>> -- manu
>>>>
>>>> -- 
>>>> Manu Sporny -https://www.linkedin.com/in/manusporny/

>>>> Founder/CEO - Digital Bazaar, Inc.
>>>> https://www.digitalbazaar.com/

>>
>> ​
>> <OpenPGP_0x80179D68654AA86C.asc>
>
​

Received on Tuesday, 6 June 2023 23:23:29 UTC