Re: RDF Dataset Canonicalization - Formal Proof

Adrian Gropper <agropper@healthurl.com> wrote:

> I think I understand canonicalization but I would appreciate a plain
> language
> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
> explanation of what Manu and Alan are talking about. Ideally, there would
> be a use-case to illustrate the utility.
>

Let's say you have the following two sentences.  "Alice and Bob went to the
store." and "Bob and Alice went to the store."  The hashes of those two
sentences are different even though they mean the same thing.
Canonicalization might say that two names separated by an "and" must be
reordered so they are alphabetical.  In that case, you change the second
sentence to match the first before computing the hash.  That works, but as
Manu pointed out, getting the canonicalization rules right is hard.

What we showed in our paper was a different approach.  You can combine the
hashes of the individual words of the original sentences in such a way that
the hashes are the same.  It's called a "set hash" because the result of
hashing a set doesn't depend on the order in which you pick the items.  I
first learned of the concept from the Zobrist hash, which is used in
computer chess to detect if you've seen a particular position before during
your search.

--------------
Alan Karp


On Sat, Mar 27, 2021 at 4:28 PM Adrian Gropper <agropper@healthurl.com>
wrote:

> I think I understand canonicalization but I would appreciate a plain
> language
> <https://www.explainxkcd.com/wiki/index.php?title=1133:_Up_Goer_Five>
> explanation of what Manu and Alan are talking about. Ideally, there would
> be a use-case to illustrate the utility.
>
> Adrian
>
> On Sat, Mar 27, 2021 at 6:01 PM Alan Karp <alanhkarp@gmail.com> wrote:
>
>> Feel free to lift any sections you like.  As far as digital signatures
>> goes, I don't recall.  It might simply be that we assumed people knew you
>> could sign once you had the digest.
>>
>> --------------
>> Alan Karp
>>
>>
>> On Sat, Mar 27, 2021 at 12:43 PM Manu Sporny <msporny@digitalbazaar.com>
>> wrote:
>>
>>> On 3/27/21 3:01 PM, Alan Karp wrote:
>>> > Yeah.  I'm still trying to figure out what I'm going to be when I grow
>>> up.
>>>
>>> *lol*, aren't we all! :P
>>>
>>> > One issue you didn't mention about our paper is that a set hash is
>>> weaker
>>> > against collision attacks. I thought that might be the reason you
>>> couldn't
>>> >  use that approach.
>>>
>>> Well, yes... I wanted to say something about that, but could also see
>>> how you
>>> could *maybe* mitigate that using large enough hashes and/or Section
>>> 6.2.2 --
>>> making the combining function be multiplication mod some suitably-large
>>> prime
>>> number. This was the part of the paper that interested me the most,
>>> Alan... I
>>> could see how that would work IF we didn't have to depend on a
>>> pre-determined
>>> set of node labels.
>>>
>>> There are performance improvements that we know are probably still
>>> locked up
>>> in the algorithm, but we needed to ship something (nine years ago) and we
>>> really haven't seen a case where performance was an issue.
>>>
>>> > In case you're interested, we wrote a follow-up,
>>> > https://www.hpl.hp.com/techreports/2004/HPL-2004-95.pdf
>>>
>>> Would you mind it if we unceremoniously lift applicable parts of
>>> "Section 5:
>>> Application for Graph Digests" from that document for the use cases
>>> document?
>>> Any reason you didn't include digital signatures in the applications
>>> section?
>>>
>>> -- manu
>>>
>>> --
>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>> Founder/CEO - Digital Bazaar, Inc.
>>> blog: Veres One Decentralized Identifier Blockchain Launches
>>> https://tinyurl.com/veres-one-launches
>>>
>>>

Received on Saturday, 27 March 2021 23:55:55 UTC