Re: RDF Dataset Canonicalization - Formal Proof from Alan Karp on 2021-03-28 (public-credentials@w3.org from March 2021)

From: Alan Karp <alanhkarp@gmail.com>
Date: Sat, 27 Mar 2021 17:41:54 -0700
To: Adrian Gropper <agropper@healthurl.com>
Cc: Drummond Reed <drummond.reed@evernym.com>, Manu Sporny <msporny@digitalbazaar.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CANpA1Z2hRfpwr07PULzzw5-VVXuBmYKBt+wEe9K9qJXArBz7+w@mail.gmail.com>
Adrian Gropper <agropper@healthurl.com> wrote:

> Thanks Alan!
>
> Would it be fair to say that using set hash you can tell two things are
> the same without having any idea of what the two things are? Sounds like a
> simple-minded cousin to ZKPs.more
>

It's more like saying you can tell if these two things are made of the same
parts.

--------------
Alan Karp


On Sat, Mar 27, 2021 at 5:31 PM Adrian Gropper <agropper@healthurl.com>
wrote:

> Thanks Alan!
>
> Would it be fair to say that using set hash you can tell two things are
> the same without having any idea of what the two things are? Sounds like a
> simple-minded cousin to ZKPs.
>
> Adrian
>
> On Sat, Mar 27, 2021 at 8:14 PM Drummond Reed <drummond.reed@evernym.com>
> wrote:
>
>> Alan, I just want to commend you for an exceptionally good plain English
>> explanation of the set hash approach. I too was not familiar with that.
>>
>> =Drummond
>>
>> On Sat, Mar 27, 2021 at 4:57 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>
>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>
>>>> I think I understand canonicalization but I would appreciate a plain
>>>> language
>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>> be a use-case to illustrate the utility.
>>>>
>>>
>>> Let's say you have the following two sentences.  "Alice and Bob went to
>>> the store." and "Bob and Alice went to the store."  The hashes of those two
>>> sentences are different even though they mean the same thing.
>>> Canonicalization might say that two names separated by an "and" must be
>>> reordered so they are alphabetical.  In that case, you change the second
>>> sentence to match the first before computing the hash.  That works, but as
>>> Manu pointed out, getting the canonicalization rules right is hard.
>>>
>>> What we showed in our paper was a different approach.  You can combine
>>> the hashes of the individual words of the original sentences in such a way
>>> that the hashes are the same.  It's called a "set hash" because the result
>>> of hashing a set doesn't depend on the order in which you pick the items.
>>> I first learned of the concept from the Zobrist hash, which is used in
>>> computer chess to detect if you've seen a particular position before during
>>> your search.
>>>
>>> --------------
>>> Alan Karp
>>>
>>>
>>> On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <agropper@healthurl.com>
>>> wrote:
>>>
>>>> I think I understand canonicalization but I would appreciate a plain
>>>> language
>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>> be a use-case to illustrate the utility.
>>>>
>>>> Adrian
>>>>
>>>> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>
>>>>> Feel free to lift any sections you like.  As far as digital signatures
>>>>> goes, I don't recall.  It might simply be that we assumed people knew you
>>>>> could sign once you had the digest.
>>>>>
>>>>> --------------
>>>>> Alan Karp
>>>>>
>>>>>
>>>>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <
>>>>> msporny@digitalbazaar.com> wrote:
>>>>>
>>>>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>>>>> > Yeah.  I'm still trying to figure out what I'm going to be when I
>>>>>> grow up.
>>>>>>
>>>>>> *lol*, aren't we all! :P
>>>>>>
>>>>>> > One issue you didn't mention about our paper is that a set hash is
>>>>>> weaker
>>>>>> > against collision attacks. I thought that might be the reason you
>>>>>> couldn't
>>>>>> >  use that approach.
>>>>>>
>>>>>> Well, yes... I wanted to say something about that, but could also see
>>>>>> how you
>>>>>> could *maybe* mitigate that using large enough hashes and/or Section
>>>>>> 6.2.2 --
>>>>>> making the combining function be multiplication mod some
>>>>>> suitably-large prime
>>>>>> number. This was the part of the paper that interested me the most,
>>>>>> Alan... I
>>>>>> could see how that would work IF we didn't have to depend on a
>>>>>> pre-determined
>>>>>> set of node labels.
>>>>>>
>>>>>> There are performance improvements that we know are probably still
>>>>>> locked up
>>>>>> in the algorithm, but we needed to ship something (nine years ago)
>>>>>> and we
>>>>>> really haven't seen a case where performance was an issue.
>>>>>>
>>>>>> > In case you're interested, we wrote a follow-up,
>>>>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>>>>
>>>>>> Would you mind it if we unceremoniously lift applicable parts of
>>>>>> "Section 5:
>>>>>> Application for Graph Digests" from that document for the use cases
>>>>>> document?
>>>>>> Any reason you didn't include digital signatures in the applications
>>>>>> section?
>>>>>>
>>>>>> -- manu
>>>>>>
>>>>>> --
>>>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>>>> Founder/CEO - Digital Bazaar, Inc.
>>>>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>>>>> https://tinyurl.com/veres-one-launches
>>>>>>
>>>>>>
Received on Sunday, 28 March 2021 00:42:19 UTC