Re: RDF Dataset Canonicalization - Formal Proof

Adrian Gropper <agropper@healthurl.com> wrote:

> But what happens to parts that are inert and were added for reasons
> outside the meaning of sameness?
>

You don't include them in the set hash, as you would do with any other
hash.

--------------
Alan Karp


On Sat, Mar 27, 2021 at 5:56 PM Adrian Gropper <agropper@healthurl.com>
wrote:

> But what happens to parts that are inert and were added for reasons
> outside the meaning of sameness?
>
> On Sat, Mar 27, 2021 at 8:42 PM Alan Karp <alanhkarp@gmail.com> wrote:
>
>> Adrian Gropper <agropper@healthurl.com> wrote:
>>
>>> Thanks Alan!
>>>
>>> Would it be fair to say that using set hash you can tell two things are
>>> the same without having any idea of what the two things are? Sounds like a
>>> simple-minded cousin to ZKPs.more
>>>
>>
>> It's more like saying you can tell if these two things are made of the
>> same parts.
>>
>> --------------
>> Alan Karp
>>
>>
>> On Sat, Mar 27, 2021 at 5:31 PM Adrian Gropper <agropper@healthurl.com>
>> wrote:
>>
>>> Thanks Alan!
>>>
>>> Would it be fair to say that using set hash you can tell two things are
>>> the same without having any idea of what the two things are? Sounds like a
>>> simple-minded cousin to ZKPs.
>>>
>>> Adrian
>>>
>>> On Sat, Mar 27, 2021 at 8:14 PM Drummond Reed <drummond.reed@evernym.com>
>>> wrote:
>>>
>>>> Alan, I just want to commend you for an exceptionally good plain
>>>> English explanation of the set hash approach. I too was not familiar with
>>>> that.
>>>>
>>>> =Drummond
>>>>
>>>> On Sat, Mar 27, 2021 at 4:57 PM Alan Karp <alanhkarp@gmail.com> wrote:
>>>>
>>>>> Adrian Gropper <agropper@healthurl.com> wrote:
>>>>>
>>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>>> language
>>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>>> be a use-case to illustrate the utility.
>>>>>>
>>>>>
>>>>> Let's say you have the following two sentences.  "Alice and Bob went
>>>>> to the store." and "Bob and Alice went to the store."  The hashes of those
>>>>> two sentences are different even though they mean the same thing.
>>>>> Canonicalization might say that two names separated by an "and" must be
>>>>> reordered so they are alphabetical.  In that case, you change the second
>>>>> sentence to match the first before computing the hash.  That works, but as
>>>>> Manu pointed out, getting the canonicalization rules right is hard.
>>>>>
>>>>> What we showed in our paper was a different approach.  You can combine
>>>>> the hashes of the individual words of the original sentences in such a way
>>>>> that the hashes are the same.  It's called a "set hash" because the result
>>>>> of hashing a set doesn't depend on the order in which you pick the items.
>>>>> I first learned of the concept from the Zobrist hash, which is used in
>>>>> computer chess to detect if you've seen a particular position before during
>>>>> your search.
>>>>>
>>>>> --------------
>>>>> Alan Karp
>>>>>
>>>>>
>>>>> On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <agropper@healthurl.com>
>>>>> wrote:
>>>>>
>>>>>> I think I understand canonicalization but I would appreciate a plain
>>>>>> language
>>>>>> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
>>>>>> explanation of what Manu and Alan are talking about. Ideally, there would
>>>>>> be a use-case to illustrate the utility.
>>>>>>
>>>>>> Adrian
>>>>>>
>>>>>> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Feel free to lift any sections you like.  As far as digital
>>>>>>> signatures goes, I don't recall.  It might simply be that we assumed people
>>>>>>> knew you could sign once you had the digest.
>>>>>>>
>>>>>>> --------------
>>>>>>> Alan Karp
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <
>>>>>>> msporny@digitalbazaar.com> wrote:
>>>>>>>
>>>>>>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>>>>>>> > Yeah.  I'm still trying to figure out what I'm going to be when I
>>>>>>>> grow up.
>>>>>>>>
>>>>>>>> *lol*, aren't we all! :P
>>>>>>>>
>>>>>>>> > One issue you didn't mention about our paper is that a set hash
>>>>>>>> is weaker
>>>>>>>> > against collision attacks. I thought that might be the reason you
>>>>>>>> couldn't
>>>>>>>> >  use that approach.
>>>>>>>>
>>>>>>>> Well, yes... I wanted to say something about that, but could also
>>>>>>>> see how you
>>>>>>>> could *maybe* mitigate that using large enough hashes and/or
>>>>>>>> Section 6.2.2 --
>>>>>>>> making the combining function be multiplication mod some
>>>>>>>> suitably-large prime
>>>>>>>> number. This was the part of the paper that interested me the most,
>>>>>>>> Alan... I
>>>>>>>> could see how that would work IF we didn't have to depend on a
>>>>>>>> pre-determined
>>>>>>>> set of node labels.
>>>>>>>>
>>>>>>>> There are performance improvements that we know are probably still
>>>>>>>> locked up
>>>>>>>> in the algorithm, but we needed to ship something (nine years ago)
>>>>>>>> and we
>>>>>>>> really haven't seen a case where performance was an issue.
>>>>>>>>
>>>>>>>> > In case you're interested, we wrote a follow-up,
>>>>>>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>>>>>>
>>>>>>>> Would you mind it if we unceremoniously lift applicable parts of
>>>>>>>> "Section 5:
>>>>>>>> Application for Graph Digests" from that document for the use cases
>>>>>>>> document?
>>>>>>>> Any reason you didn't include digital signatures in the
>>>>>>>> applications section?
>>>>>>>>
>>>>>>>> -- manu
>>>>>>>>
>>>>>>>> --
>>>>>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>>>>>> Founder/CEO - Digital Bazaar, Inc.
>>>>>>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>>>>>>> https://tinyurl.com/veres-one-launches
>>>>>>>>
>>>>>>>>

Received on Sunday, 28 March 2021 01:40:26 UTC