Re: RDF Dataset Canonicalization - Formal Proof from Michael Ruminer on 2021-03-29 (public-credentials@w3.org from March 2021)

From: Michael Ruminer <michael.ruminer@gmail.com>
Date: Mon, 29 Mar 2021 11:09:57 -0400
To: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CAHCj7LSayE5_U090gw25wrUr9TAuP1BNGB-V6OWERc7hqu+DNQ@mail.gmail.com>
Just tossing this out there for those that may no be familiar. Not
providing any statements of fitness but since it is strongly related it may
be of interest to folks. There are also redactable signature schemes that
exist. (See Poehls and Samelin as well as others ). Using a witness
signature per attribute and an accumulator that verifies the witness
signature belongs. To redact simply remove the signed attribute and it's
witness signature and there are no more indicators it ever existed and all
other attributes validate against the accumulator has they normally would.
The accumulator could be unique per document (side note unfortunately the
accumulator needs created with safe primes).

If anyone wants pointed to some of the papers email me directly and I'll
provide a list of the papers I have.

On Sat, Mar 27, 2021 at 10:22 PM Steve Capell <steve.capell@gmail.com>
wrote:

> The Singapore government https://www.openattestation.com/ does this
> already . Version 3 is W3C VC data model compliant
>
> Each element is hashed (with salt I think) and then the hash of the hashed
> is the document hash that is notarised
>
> The main rationale is selective redaction (because the root hash is
> unchanged when some clear text is hidden). But I suppose it simplifies
> canonicalisation too...
>
> Steven Capell
> Mob: 0410 437854
>
> On 28 Mar 2021, at 12:42 pm, Alan Karp <alanhkarp@gmail.com> wrote:
>
> 
> Adrian Gropper <agropper@healthurl.com> wrote:
>
>> But what happens to parts that are inert and were added for reasons
>> outside the meaning of sameness?
>>
>
> You don't include them in the set hash, as you would do with any other
> hash.
>
> --------------
> Alan Karp
>
>
> On Sat, Mar 27, 2021 at 5:56 PM Adrian Gropper <agropper@healthurl.com>
> wrote:
>
>> But what happens to parts that are inert and were added for reasons
>> outside the meaning of sameness?
>>
>> On Sat, Mar 27, 2021 at 8:42 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>
>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>
>>>> Thanks Alan!
>>>>
>>>> Would it be fair to say that using set hash you can tell two things are
>>>> the same without having any idea of what the two things are? Sounds like a
>>>> simple-minded cousin to ZKPs.more
>>>>
>>>
>>> It's more like saying you can tell if these two things are made of the
>>> same parts.
>>>
>>> --------------
>>> Alan Karp
>>>
>>>
>>> On Sat, Mar 27, 2021 at 5:31 PM Adrian Gropper <agropper@healthurl.com>
>>> wrote:
>>>
>>>> Thanks Alan!
>>>>
>>>> Would it be fair to say that using set hash you can tell two things are
>>>> the same without having any idea of what the two things are? Sounds like a
>>>> simple-minded cousin to ZKPs.
>>>>
>>>> Adrian
>>>>
>>>> On Sat, Mar 27, 2021 at 8:14 PM Drummond Reed <
>>>> drummond.reed@evernym.com> wrote:
>>>>
>>>>> Alan, I just want to commend you for an exceptionally good plain
>>>>> English explanation of the set hash approach. I too was not familiar with
>>>>> that.
>>>>>
>>>>> =Drummond
>>>>>
>>>>> On Sat, Mar 27, 2021 at 4:57 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>>
>>>>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>>>>
>>>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>>>> language
>>>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>>>> be a use-case to illustrate the utility.
>>>>>>>
>>>>>>
>>>>>> Let's say you have the following two sentences.  "Alice and Bob went
>>>>>> to the store." and "Bob and Alice went to the store."  The hashes of those
>>>>>> two sentences are different even though they mean the same thing.
>>>>>> Canonicalization might say that two names separated by an "and" must be
>>>>>> reordered so they are alphabetical.  In that case, you change the second
>>>>>> sentence to match the first before computing the hash.  That works, but as
>>>>>> Manu pointed out, getting the canonicalization rules right is hard.
>>>>>>
>>>>>> What we showed in our paper was a different approach.  You can
>>>>>> combine the hashes of the individual words of the original sentences in
>>>>>> such a way that the hashes are the same.  It's called a "set hash" because
>>>>>> the result of hashing a set doesn't depend on the order in which you pick
>>>>>> the items.  I first learned of the concept from the Zobrist hash, which is
>>>>>> used in computer chess to detect if you've seen a particular position
>>>>>> before during your search.
>>>>>>
>>>>>> --------------
>>>>>> Alan Karp
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <
>>>>>> agropper@healthurl.com> wrote:
>>>>>>
>>>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>>>> language
>>>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>>>> be a use-case to illustrate the utility.
>>>>>>>
>>>>>>> Adrian
>>>>>>>
>>>>>>> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Feel free to lift any sections you like.  As far as digital
>>>>>>>> signatures goes, I don't recall.  It might simply be that we assumed people
>>>>>>>> knew you could sign once you had the digest.
>>>>>>>>
>>>>>>>> --------------
>>>>>>>> Alan Karp
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <
>>>>>>>> msporny@digitalbazaar.com> wrote:
>>>>>>>>
>>>>>>>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>>>>>>>> > Yeah.  I'm still trying to figure out what I'm going to be when
>>>>>>>>> I grow up.
>>>>>>>>>
>>>>>>>>> *lol*, aren't we all! :P
>>>>>>>>>
>>>>>>>>> > One issue you didn't mention about our paper is that a set hash
>>>>>>>>> is weaker
>>>>>>>>> > against collision attacks. I thought that might be the reason
>>>>>>>>> you couldn't
>>>>>>>>> >  use that approach.
>>>>>>>>>
>>>>>>>>> Well, yes... I wanted to say something about that, but could also
>>>>>>>>> see how you
>>>>>>>>> could *maybe* mitigate that using large enough hashes and/or
>>>>>>>>> Section 6.2.2 --
>>>>>>>>> making the combining function be multiplication mod some
>>>>>>>>> suitably-large prime
>>>>>>>>> number. This was the part of the paper that interested me the
>>>>>>>>> most, Alan... I
>>>>>>>>> could see how that would work IF we didn't have to depend on a
>>>>>>>>> pre-determined
>>>>>>>>> set of node labels.
>>>>>>>>>
>>>>>>>>> There are performance improvements that we know are probably still
>>>>>>>>> locked up
>>>>>>>>> in the algorithm, but we needed to ship something (nine years ago)
>>>>>>>>> and we
>>>>>>>>> really haven't seen a case where performance was an issue.
>>>>>>>>>
>>>>>>>>> > In case you're interested, we wrote a follow-up,
>>>>>>>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>>>>>>>
>>>>>>>>> Would you mind it if we unceremoniously lift applicable parts of
>>>>>>>>> "Section 5:
>>>>>>>>> Application for Graph Digests" from that document for the use
>>>>>>>>> cases document?
>>>>>>>>> Any reason you didn't include digital signatures in the
>>>>>>>>> applications section?
>>>>>>>>>
>>>>>>>>> -- manu
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>>>>>>> Founder/CEO - Digital Bazaar, Inc.
>>>>>>>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>>>>>>>> https://tinyurl.com/veres-one-launches
>>>>>>>>>
>>>>>>>>>
Received on Monday, 29 March 2021 15:11:25 UTC