Re: Issues with parameterized hashing algorithms used internally from Ivan Herman on 2023-09-19 (public-rch-wg@w3.org from September 2023)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 19 Sep 2023 19:57:41 +0200
To: Dave Longley <dlongley@digitalbazaar.com>
Cc: Dan Yamamoto <dan@iij.ad.jp>, Manu Sporny <msporny@digitalbazaar.com>, Phil Archer <phil.archer@gs1.org>, Sebastian Crane <seabass-labrax@gmx.com>, Gregg Kellogg <gregg@greggkellogg.net>, RDF Dataset Canonicalization and Hash Working Group <public-rch-wg@w3.org>
Message-Id: <9F71E8A6-0D1C-4242-96E7-13A1F1CA38C8@w3.org>

> On 19 Sep 2023, at 17:47, Dave Longley <dlongley@digitalbazaar.com> wrote:
> 
> My thoughts:
> 
> We've already expressed an identifier for the RDF Dataset
> Canonicalization Algorithm: RDFC-1.0 -- and it uses a default hash
> algorithm, SHA-256 internally ... and any other hash algorithm that
> could be used with it will similarly have its own identifier
> ("SHA-384", "SHAKE256", so on). These things are decoupled from one
> another (RDFC-1.0 works with any hash algorithm) and specifying which
> of these has been used (if it deviates from the default) seems to be
> in the domain of whatever format / standard / etc. is used to express
> metadata about either the canonicalized dataset or a hash of it
> (which, notably, would further include another hash algorithm which
> may or may not be the same).
> 
> I don't think it's a good idea to invent a new hash metadata
> expression mechanism in this group. These things exist elsewhere (such
> as multihash, or SRI, or RFC 6920) and some of them have their own
> registries where this metadata goes and where it is mapped to
> identifiers and / or "header values" that work within those specific
> formats. That is the right place, IMO, to put this kind of information
> and to enable interoperability on processing however it is expressed.


I agree. That is essentially what I proposed in my first reaction to Sebastian's mail.

Ivan


> 
> The specification we've produced is what enables someone to use
> whatever metadata parameters they parse from (or input into) such
> expressions to reproduce / verify / etc. some expected value. I see
> our specification as being similar to the SHA-256 specification, that
> indicates how to produce such a digest, but it does not define a hash
> metadata expression format itself.
> 
> On Tue, Sep 19, 2023 at 9:09 AM Dan Yamamoto <dan@iij.ad.jp> wrote:
>> 
>> I also probably share the same opinion with Ivan. Since RDFC-1.0 isn't
>> always used alongside Data Integrity, I thought it would be better for
>> it to have some precise algorithm identifier on its own.
>> 
>> Dan
>> 
>> On 2023/09/19 0:49, Ivan Herman wrote:
>>> 
>>> 
>>>> On 18 Sep 2023, at 17:26, Manu Sporny <msporny@digitalbazaar.com> wrote:
>>>> 
>>>> On Mon, Sep 18, 2023 at 11:15 AM Phil Archer <phil.archer@gs1.org> wrote:
>>>>> From: Dan Yamamoto <dan@iij.ad.jp>
>>>>> Therefore, I believe the internal hash function should be
>>>>> interchangeable. However, as others have suggested, I think there is
>>>>> a need to introduce a mechanism to specify what hash function is used
>>>>> explicitly.
>>>> 
>>>> Just to jump in quickly on this thread; it feels like the harms are
>>>> being exaggerated given the way we know that RDFC-1.0 is used today.
>>>> If we look at how the VC Data Integrity specifications use the
>>>> algorithm, you /always/ know which internal hash algorithm was used
>>>> (or should be used) because it's signalled to you via the Data
>>>> Integrity algorithm identifier. You don't have to guess, you are told
>>>> exactly which internal hash algorithm to use.
>>>> 
>>>> I wonder if folks are missing this detail? It was always expected that
>>>> the internal hash information would be signalled to the caller, and
>>>> that's exactly what Data Integrity does. Perhaps all we need to do in
>>>> the spec is ensure that one of the outputs is the internal hash
>>>> function used and to tell spec writers that use RDFC-1.0 that any
>>>> algorithm that uses it needs to clearly stipulate which internal
>>>> algorithm to use when calling the algorithm (and if not, the default
>>>> will be used)?
>>>> 
>>> 
>>> I do not think the issue is with spec writers. RDFC-1.0 is meant for any
>>> lambda users of Linked Data, not only for spec writers. While what you
>>> say is o.k., what we need is a way to convey the information of what
>>> hash function was used when we provide the hash of a specific graph,
>>> because that hash may travel from one lambda user to the other.
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>>> This feels more like a miscommunication than a design issue. Does the
>>>> above help clarify?
>>>> 
>>>> -- manu
>>>> 
>>>> --
>>>> Manu Sporny - https://www.linkedin.com/in/manusporny/
>>>> Founder/CEO - Digital Bazaar, Inc.
>>>> https://www.digitalbazaar.com/
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +33 6 52 46 00 43
>>> 
>>> 
>> 
>> --
>> Dan Yamamoto <dan@iij.ad.jp>
>> Internet Initiative Japan Inc.
>> 
>> 
>> 
> 
> 
> --
> 
> Dave Longley
> CTO
> Digital Bazaar, Inc.


----
Ivan Herman, W3C
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43

Received on Tuesday, 19 September 2023 17:58:08 UTC