Re: RDF Dataset Canonicalization - Formal Proof from Adrian Gropper on 2021-03-28 (public-credentials@w3.org from March 2021)

From: Adrian Gropper <agropper@healthurl.com>
Date: Sat, 27 Mar 2021 20:56:38 -0400
To: Alan Karp <alanhkarp@gmail.com>
Cc: Drummond Reed <drummond.reed@evernym.com>, Manu Sporny <msporny@digitalbazaar.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CANYRo8h_e9vb_pkcs07L2j-uJJz+b9wm_PibRt_O66gx2=F8yQ@mail.gmail.com>
But what happens to parts that are inert and were added for reasons
outside the meaning of sameness?

On Sat, Mar 27, 2021 at 8:42 PM Alan Karp <alanhkarp@gmail.com> wrote:

> Adrian Gropper <agropper@healthurl.com> wrote:
>
>> Thanks Alan!
>>
>> Would it be fair to say that using set hash you can tell two things are
>> the same without having any idea of what the two things are? Sounds like a
>> simple-minded cousin to ZKPs.more
>>
>
> It's more like saying you can tell if these two things are made of the
> same parts.
>
> --------------
> Alan Karp
>
>
> On Sat, Mar 27, 2021 at 5:31 PM Adrian Gropper <agropper@healthurl.com>
> wrote:
>
>> Thanks Alan!
>>
>> Would it be fair to say that using set hash you can tell two things are
>> the same without having any idea of what the two things are? Sounds like a
>> simple-minded cousin to ZKPs.
>>
>> Adrian
>>
>> On Sat, Mar 27, 2021 at 8:14 PM Drummond Reed <drummond.reed@evernym.com>
>> wrote:
>>
>>> Alan, I just want to commend you for an exceptionally good plain English
>>> explanation of the set hash approach. I too was not familiar with that.
>>>
>>> =Drummond
>>>
>>> On Sat, Mar 27, 2021 at 4:57 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>
>>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>>
>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>> language
>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>> be a use-case to illustrate the utility.
>>>>>
>>>>
>>>> Let's say you have the following two sentences.  "Alice and Bob went to
>>>> the store." and "Bob and Alice went to the store."  The hashes of those two
>>>> sentences are different even though they mean the same thing.
>>>> Canonicalization might say that two names separated by an "and" must be
>>>> reordered so they are alphabetical.  In that case, you change the second
>>>> sentence to match the first before computing the hash.  That works, but as
>>>> Manu pointed out, getting the canonicalization rules right is hard.
>>>>
>>>> What we showed in our paper was a different approach.  You can combine
>>>> the hashes of the individual words of the original sentences in such a way
>>>> that the hashes are the same.  It's called a "set hash" because the result
>>>> of hashing a set doesn't depend on the order in which you pick the items.
>>>> I first learned of the concept from the Zobrist hash, which is used in
>>>> computer chess to detect if you've seen a particular position before during
>>>> your search.
>>>>
>>>> --------------
>>>> Alan Karp
>>>>
>>>>
>>>> On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <agropper@healthurl.com>
>>>> wrote:
>>>>
>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>> language
>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>> be a use-case to illustrate the utility.
>>>>>
>>>>> Adrian
>>>>>
>>>>> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>>
>>>>>> Feel free to lift any sections you like.  As far as digital
>>>>>> signatures goes, I don't recall.  It might simply be that we assumed people
>>>>>> knew you could sign once you had the digest.
>>>>>>
>>>>>> --------------
>>>>>> Alan Karp
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <
>>>>>> msporny@digitalbazaar.com> wrote:
>>>>>>
>>>>>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>>>>>> > Yeah.  I'm still trying to figure out what I'm going to be when I
>>>>>>> grow up.
>>>>>>>
>>>>>>> *lol*, aren't we all! :P
>>>>>>>
>>>>>>> > One issue you didn't mention about our paper is that a set hash is
>>>>>>> weaker
>>>>>>> > against collision attacks. I thought that might be the reason you
>>>>>>> couldn't
>>>>>>> >  use that approach.
>>>>>>>
>>>>>>> Well, yes... I wanted to say something about that, but could also
>>>>>>> see how you
>>>>>>> could *maybe* mitigate that using large enough hashes and/or Section
>>>>>>> 6.2.2 --
>>>>>>> making the combining function be multiplication mod some
>>>>>>> suitably-large prime
>>>>>>> number. This was the part of the paper that interested me the most,
>>>>>>> Alan... I
>>>>>>> could see how that would work IF we didn't have to depend on a
>>>>>>> pre-determined
>>>>>>> set of node labels.
>>>>>>>
>>>>>>> There are performance improvements that we know are probably still
>>>>>>> locked up
>>>>>>> in the algorithm, but we needed to ship something (nine years ago)
>>>>>>> and we
>>>>>>> really haven't seen a case where performance was an issue.
>>>>>>>
>>>>>>> > In case you're interested, we wrote a follow-up,
>>>>>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>>>>>
>>>>>>> Would you mind it if we unceremoniously lift applicable parts of
>>>>>>> "Section 5:
>>>>>>> Application for Graph Digests" from that document for the use cases
>>>>>>> document?
>>>>>>> Any reason you didn't include digital signatures in the applications
>>>>>>> section?
>>>>>>>
>>>>>>> -- manu
>>>>>>>
>>>>>>> --
>>>>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>>>>> Founder/CEO - Digital Bazaar, Inc.
>>>>>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>>>>>> https://tinyurl.com/veres-one-launches
>>>>>>>
>>>>>>>
Received on Sunday, 28 March 2021 00:57:02 UTC