Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Jamie McCusker on 2021-06-08 (public-semweb-lifesci@w3.org from June 2021)

From: Jamie McCusker <mccusker@gmail.com>
Date: Mon, 7 Jun 2021 22:03:04 -0400
To: David Booth <david@dbooth.org>
Cc: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <CAAtgn=SaDHR2zAbSVCT5M3dBcZ1hRN30PC+9UGXDHYfek9BqJQ@mail.gmail.com>
I really think that canonicalizing RDF graphs by sorting their statements
is a mistake. Obviously I'm biased towards the approach I used in RGDA1
(see implementation in RDFlib and writeup in my dissertation) of Sayers and
Karp with Nauty-based canonicalization of bnodes, but the process does not
need to and should not involve sorting and serializing graphs in order to
create a digest for them.

Thanks,
Jamie

On Mon, Jun 7, 2021 at 5:42 PM David Booth <david@dbooth.org> wrote:

> Thank you for your work on this!  I think RDF canonicalization is very
> important, and I also see the value in the proposed digital signatures
> work.  But I have two immediate suggestions and one major question.
>
> 1. The proposed RDF Dataset Hash (RDH) algorithm talks about "sorting
> the N-Quads serialization of the canonical form [of the RDF Dataset]".
> Clearly the intent is to produce a canonical N-Quads serialization, in
> preparation for hashing.  But at present the charter does not identify
> the Canonical N-Quads serialization algorithm as a named deliverable.
> It definitely should, so that it can be easily referenced and used in
> its own right.
>
> 2. In the Use Cases section of the Explainer, I suggest adding a
> diff/patch use case.  I think it would be a huge missed opportunity if
> that were ignored in standardizing an RDF canonicalization algorithm.
> See further explanation below.
>
> 3. Although I see the value of an RDF-based digital signatures
> vocabulary, in reading the proposed charter and associated materials I
> have been unable to understand the value in *restricting* this
> vocabulary to source documents that happen to be RDF.  Why not allow it
> to be used on *any* kind of digital source documents?  Cryptographic
> hash algorithms don't care what kind of source document their input
> bytes represent.  Why should this digital signatures vocabulary care
> about the format or language of the source document?  I can imagine a
> digital signatures vocabulary providing a way to formally state
> something like: "if user U signed digital contract C, then it means that
> U has agreed to the terms of contract C".  But I do not yet see why it
> would need to say anything about the format or language of C.  C just is
> whatever it is, whether it's English, RDF or something else.  Can
> someone enlighten me on this point?
>
> Those are my high level comments and question.  Further explanation
> about the diff/patch use case follows.
>
>                  -----------------------------------
>
> Diff/Patch Use Case:
> The key consideration that the diff/patch use case adds to
> canonicalization is that a "small" change to an RDF dataset should
> produce a commensurately "small" change in the canonicalized result (to
> the extent possible), at least for common use cases, such as
> adding/deleting a few triples, adding/deleting an RDF molecule/object
> (or a concise bounded description
> https://www.w3.org/Submission/2004/SUBM-CBD-20040930/ or similar),
> adding/deleting a graph from an RDF dataset, adding/deleting list
> elements, or adding/deleting a level of hierarchy in a tree (or tree-ish
> graph).
>
> This requirement is not important for digital signature use cases, but
> it is essential for diff/patch use cases.  And to be clear, this
> requirement ONLY applies to the canonicalization algorithm -- NOT the
> hashing algorithm.  Indeed, a cryptographic hashing algorithm must have
> exactly the opposite property: a small change in the input must produce
> a LARGE (random) change in the output.
>
> If the proposed canonicalization algorithms already meet this
> requirement, then that would be great.  But I am not aware of any
> testing that has been done on them with this use case in mind, to find
> out whether they would meet this requirement.  And I do think this
> requirement is important for a general purpose canonicalization standard.
>
> Proposed text for use cases:
> [[
> Diff/patch of RDF Datasets
> For diff/patch applications that need to track changes to RDF datasets,
> or keep two RDF datasets in sync by applying incremental changes, the
> N-Quads Canonicalization algorithm should make best efforts to produce
> results that are well suited for use with existing line-oriented diff
> and patch tools.  This means that, given a "small" change to an RDF
> dataset -- i.e., changing only a "small" number of lines in an N-Quads
> representation -- the N-Quads Canonicalization Algorithm will most
> likely produce a commensurately "small" change in its canonicalized
> result, for common diff/patch use cases.  Common use cases include
> adding/deleting a few triples, adding/deleting an RDF molecule/object
> (or a concise bounded description
> https://www.w3.org/Submission/2004/SUBM-CBD-20040930/ or similar),
> adding/deleting a graph from an RDF dataset, adding/deleting list
> elements, or adding/deleting a level of hierarchy in a tree (or tree-ish
> graph)
> Requirement: A Diff-Friendly N-Quads Canonicalization Algorithm
> ]]
>
> Thanks,
> David Booth
>
> On 4/6/21 6:20 AM, Ivan Herman wrote:
> > Dear all,
> >
> > the W3C has started to work on a Working Group charter for Linked Data
> > Signatures:
> >
> > https://w3c.github.io/lds-wg-charter/index.html
> > <https://w3c.github.io/lds-wg-charter/index.html>
> >
> > The work proposed in this Working Group includes Linked Data
> > Canonicalization, as well as algorithms and vocabularies for encoding
> > digital proofs, such as digital signatures, and with that secure
> > information expressed in serializations such as JSON-LD, TriG, and
> N-Quads.
> >
> > The need for Linked Data canonicalization, digest, or signature has been
> > known for a very long time, but it is only in recent years that research
> > and development has resulted in mathematical algorithms and related
> > implementations that are on the maturity level for a Web Standard. A
> > separate explainer document:
> >
> > https://w3c.github.io/lds-wg-charter/explainer.html
> > <https://w3c.github.io/lds-wg-charter/explainer.html>
> >
> > provides some background, as well as a small set of use cases.
> >
> > The W3C Credentials Community Group[1,2] has been instrumental in the
> > work leading to this charter proposal, not the least due to its work on
> > Verifiable Credentials and with recent applications and development on,
> > e.g., vaccination passports using those technologies.
> >
> > It must be emphasized, however, that this work is not bound to a
> > specific application area or serialization. There are numerous use cases
> > in Linked Data, like the publication of biological and pharmaceutical
> > data, consumption of mission critical RDF vocabularies, and others, that
> > depend on the ability to verify the authenticity and integrity of the
> > data being consumed. This Working Group aims at covering all those, and
> > we hope to involve the Linked Data Community at large in the elaboration
> > of the final charter proposal.
> >
> > We welcome your general expressions of interest and support. If you wish
> > to make your comments public, please use GitHub issues:
> >
> > https://github.com/w3c/lds-wg-charter/issues
> > <https://github.com/w3c/lds-wg-charter/issues>
> >
> > A formal W3C Advisory Committee Review for this charter is expected in
> > about six weeks.
> >
> > [1] https://www.w3.org/community/credentials/
> > <https://www.w3.org/community/credentials/>
> > [2] https://w3c-ccg.github.io/ <https://w3c-ccg.github.io/>
> >
> >
> > ----
> > Ivan Herman, W3C
> > Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> > mobile: +33 6 52 46 00 43
> > ORCID ID: https://orcid.org/0000-0003-0782-2704
> > <https://orcid.org/0000-0003-0782-2704>
> >
>
>

-- 
Jamie McCusker (they<she)

Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj2@rpi.edu <mccusj@cs.rpi.edu>
http://tw.rpi.edu
Received on Tuesday, 8 June 2021 02:04:41 UTC