Re: RDF Dataset Canonicalization - Formal Proof from Steve Capell on 2021-03-28 (public-credentials@w3.org from March 2021)

From: Steve Capell <steve.capell@gmail.com>
Date: Sun, 28 Mar 2021 13:20:23 +1100
To: Alan Karp <alanhkarp@gmail.com>
Cc: Adrian Gropper <agropper@healthurl.com>, Drummond Reed <drummond.reed@evernym.com>, Manu Sporny <msporny@digitalbazaar.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-Id: <A7113BE4-B6AE-4541-AFA2-E15D8C6C55AA@gmail.com>
The Singapore government https://www.openattestation.com/ does this already . Version 3 is W3C VC data model compliant 

Each element is hashed (with salt I think) and then the hash of the hashed is the document hash that is notarised 

The main rationale is selective redaction (because the root hash is unchanged when some clear text is hidden). But I suppose it simplifies canonicalisation too...

Steven Capell
Mob: 0410 437854

> On 28 Mar 2021, at 12:42 pm, Alan Karp <alanhkarp@gmail.com> wrote:
> 
> 
> Adrian Gropper <agropper@healthurl.com> wrote:
>> But what happens to parts that are inert and were added for reasons outside the meaning of sameness?
> 
> You don't include them in the set hash, as you would do with any other hash. 
> 
> --------------
> Alan Karp
> 
> 
>> On Sat, Mar 27, 2021 at 5:56 PM Adrian Gropper <agropper@healthurl.com> wrote:
>> But what happens to parts that are inert and were added for reasons outside the meaning of sameness?
>> 
>>> On Sat, Mar 27, 2021 at 8:42 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>> Thanks Alan!
>>>> 
>>>> Would it be fair to say that using set hash you can tell two things are the same without having any idea of what the two things are? Sounds like a simple-minded cousin to ZKPs.more
>>> 
>>> It's more like saying you can tell if these two things are made of the same parts.
>>> 
>>> --------------
>>> Alan Karp
>>> 
>>> 
>>>> On Sat, Mar 27, 2021 at 5:31 PM Adrian Gropper <agropper@healthurl.com> wrote:
>>>> Thanks Alan!
>>>> 
>>>> Would it be fair to say that using set hash you can tell two things are the same without having any idea of what the two things are? Sounds like a simple-minded cousin to ZKPs. 
>>>> 
>>>> Adrian
>>>> 
>>>>> On Sat, Mar 27, 2021 at 8:14 PM Drummond Reed <drummond.reed@evernym.com> wrote:
>>>>> Alan, I just want to commend you for an exceptionally good plain English explanation of the set hash approach. I too was not familiar with that.
>>>>> 
>>>>> =Drummond 
>>>>> 
>>>>>> On Sat, Mar 27, 2021 at 4:57 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>>>>> I think I understand canonicalization but I would appreciate a plain language explanation of what Manu and Alan are talking about. Ideally, there would be a use-case to illustrate the utility.
>>>>>> 
>>>>>> Let's say you have the following two sentences.  "Alice and Bob went to the store." and "Bob and Alice went to the store."  The hashes of those two sentences are different even though they mean the same thing.  Canonicalization might say that two names separated by an "and" must be reordered so they are alphabetical.  In that case, you change the second sentence to match the first before computing the hash.  That works, but as Manu pointed out, getting the canonicalization rules right is hard.
>>>>>> 
>>>>>> What we showed in our paper was a different approach.  You can combine the hashes of the individual words of the original sentences in such a way that the hashes are the same.  It's called a "set hash" because the result of hashing a set doesn't depend on the order in which you pick the items.  I first learned of the concept from the Zobrist hash, which is used in computer chess to detect if you've seen a particular position before during your search.
>>>>>> 
>>>>>> --------------
>>>>>> Alan Karp
>>>>>> 
>>>>>> 
>>>>>>> On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <agropper@healthurl.com> wrote:
>>>>>>> I think I understand canonicalization but I would appreciate a plain language explanation of what Manu and Alan are talking about. Ideally, there would be a use-case to illustrate the utility.
>>>>>>> 
>>>>>>> Adrian
>>>>>>> 
>>>>>>>> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>>>>> Feel free to lift any sections you like.  As far as digital signatures goes, I don't recall.  It might simply be that we assumed people knew you could sign once you had the digest.
>>>>>>>> 
>>>>>>>> --------------
>>>>>>>> Alan Karp
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <msporny@digitalbazaar.com> wrote:
>>>>>>>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>>>>>>>> > Yeah.  I'm still trying to figure out what I'm going to be when I grow up.
>>>>>>>>> 
>>>>>>>>> *lol*, aren't we all! :P
>>>>>>>>> 
>>>>>>>>> > One issue you didn't mention about our paper is that a set hash is weaker 
>>>>>>>>> > against collision attacks. I thought that might be the reason you couldn't
>>>>>>>>> >  use that approach.
>>>>>>>>> 
>>>>>>>>> Well, yes... I wanted to say something about that, but could also see how you
>>>>>>>>> could *maybe* mitigate that using large enough hashes and/or Section 6.2.2 --
>>>>>>>>> making the combining function be multiplication mod some suitably-large prime
>>>>>>>>> number. This was the part of the paper that interested me the most, Alan... I
>>>>>>>>> could see how that would work IF we didn't have to depend on a pre-determined
>>>>>>>>> set of node labels.
>>>>>>>>> 
>>>>>>>>> There are performance improvements that we know are probably still locked up
>>>>>>>>> in the algorithm, but we needed to ship something (nine years ago) and we
>>>>>>>>> really haven't seen a case where performance was an issue.
>>>>>>>>> 
>>>>>>>>> > In case you're interested, we wrote a follow-up, 
>>>>>>>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>>>>>>> 
>>>>>>>>> Would you mind it if we unceremoniously lift applicable parts of "Section 5:
>>>>>>>>> Application for Graph Digests" from that document for the use cases document?
>>>>>>>>> Any reason you didn't include digital signatures in the applications section?
>>>>>>>>> 
>>>>>>>>> -- manu
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>>>>>>> Founder/CEO - Digital Bazaar, Inc.
>>>>>>>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>>>>>>>> https://tinyurl.com/veres-one-launches
>>>>>>>>>
Received on Sunday, 28 March 2021 02:20:40 UTC