Re: Issues with parameterized hashing algorithms used internally from Dave Longley on 2023-09-19 (public-rch-wg@w3.org from September 2023)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Tue, 19 Sep 2023 11:47:58 -0400
To: Dan Yamamoto <dan@iij.ad.jp>
Cc: Ivan Herman <ivan@w3.org>, Manu Sporny <msporny@digitalbazaar.com>, Phil Archer <phil.archer@gs1.org>, Sebastian Crane <seabass-labrax@gmx.com>, Gregg Kellogg <gregg@greggkellogg.net>, RDF Dataset Canonicalization and Hash Working Group <public-rch-wg@w3.org>
Message-ID: <CAMJ8eMNhzpeJOEjjx83v7u5bTBM0PGtwiDxc9BKKvz3zmBha1w@mail.gmail.com>

My thoughts:

We've already expressed an identifier for the RDF Dataset
Canonicalization Algorithm: RDFC-1.0 -- and it uses a default hash
algorithm, SHA-256 internally ... and any other hash algorithm that
could be used with it will similarly have its own identifier
("SHA-384", "SHAKE256", so on). These things are decoupled from one
another (RDFC-1.0 works with any hash algorithm) and specifying which
of these has been used (if it deviates from the default) seems to be
in the domain of whatever format / standard / etc. is used to express
metadata about either the canonicalized dataset or a hash of it
(which, notably, would further include another hash algorithm which
may or may not be the same).

I don't think it's a good idea to invent a new hash metadata
expression mechanism in this group. These things exist elsewhere (such
as multihash, or SRI, or RFC 6920) and some of them have their own
registries where this metadata goes and where it is mapped to
identifiers and / or "header values" that work within those specific
formats. That is the right place, IMO, to put this kind of information
and to enable interoperability on processing however it is expressed.

The specification we've produced is what enables someone to use
whatever metadata parameters they parse from (or input into) such
expressions to reproduce / verify / etc. some expected value. I see
our specification as being similar to the SHA-256 specification, that
indicates how to produce such a digest, but it does not define a hash
metadata expression format itself.

On Tue, Sep 19, 2023 at 9:09 AM Dan Yamamoto <dan@iij.ad.jp> wrote:
>
> I also probably share the same opinion with Ivan. Since RDFC-1.0 isn't
> always used alongside Data Integrity, I thought it would be better for
> it to have some precise algorithm identifier on its own.
>
> Dan
>
> On 2023/09/19 0:49, Ivan Herman wrote:
> >
> >
> >> On 18 Sep 2023, at 17:26, Manu Sporny <msporny@digitalbazaar.com> wrote:
> >>
> >> On Mon, Sep 18, 2023 at 11:15 AM Phil Archer <phil.archer@gs1.org> wrote:
> >>> From: Dan Yamamoto <dan@iij.ad.jp>
> >>> Therefore, I believe the internal hash function should be
> >>> interchangeable. However, as others have suggested, I think there is
> >>> a need to introduce a mechanism to specify what hash function is used
> >>> explicitly.
> >>
> >> Just to jump in quickly on this thread; it feels like the harms are
> >> being exaggerated given the way we know that RDFC-1.0 is used today.
> >> If we look at how the VC Data Integrity specifications use the
> >> algorithm, you /always/ know which internal hash algorithm was used
> >> (or should be used) because it's signalled to you via the Data
> >> Integrity algorithm identifier. You don't have to guess, you are told
> >> exactly which internal hash algorithm to use.
> >>
> >> I wonder if folks are missing this detail? It was always expected that
> >> the internal hash information would be signalled to the caller, and
> >> that's exactly what Data Integrity does. Perhaps all we need to do in
> >> the spec is ensure that one of the outputs is the internal hash
> >> function used and to tell spec writers that use RDFC-1.0 that any
> >> algorithm that uses it needs to clearly stipulate which internal
> >> algorithm to use when calling the algorithm (and if not, the default
> >> will be used)?
> >>
> >
> > I do not think the issue is with spec writers. RDFC-1.0 is meant for any
> > lambda users of Linked Data, not only for spec writers. While what you
> > say is o.k., what we need is a way to convey the information of what
> > hash function was used when we provide the hash of a specific graph,
> > because that hash may travel from one lambda user to the other.
> >
> > Ivan
> >
> >
> >
> >> This feels more like a miscommunication than a design issue. Does the
> >> above help clarify?
> >>
> >> -- manu
> >>
> >> --
> >> Manu Sporny - https://www.linkedin.com/in/manusporny/
> >> Founder/CEO - Digital Bazaar, Inc.
> >> https://www.digitalbazaar.com/
> >>
> >
> >
> > ----
> > Ivan Herman, W3C
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +33 6 52 46 00 43
> >
> >
>
> --
> Dan Yamamoto <dan@iij.ad.jp>
> Internet Initiative Japan Inc.
>
>
>

-- 

Dave Longley
CTO
Digital Bazaar, Inc.

Received on Tuesday, 19 September 2023 15:48:21 UTC