- From: David Booth <david@dbooth.org>
- Date: Mon, 7 Jun 2021 17:35:22 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Thank you for your work on this! I think RDF canonicalization is very important, and I also see the value in the proposed digital signatures work. But I have two immediate suggestions and one major question. 1. The proposed RDF Dataset Hash (RDH) algorithm talks about "sorting the N-Quads serialization of the canonical form [of the RDF Dataset]". Clearly the intent is to produce a canonical N-Quads serialization, in preparation for hashing. But at present the charter does not identify the Canonical N-Quads serialization algorithm as a named deliverable. It definitely should, so that it can be easily referenced and used in its own right. 2. In the Use Cases section of the Explainer, I suggest adding a diff/patch use case. I think it would be a huge missed opportunity if that were ignored in standardizing an RDF canonicalization algorithm. See further explanation below. 3. Although I see the value of an RDF-based digital signatures vocabulary, in reading the proposed charter and associated materials I have been unable to understand the value in *restricting* this vocabulary to source documents that happen to be RDF. Why not allow it to be used on *any* kind of digital source documents? Cryptographic hash algorithms don't care what kind of source document their input bytes represent. Why should this digital signatures vocabulary care about the format or language of the source document? I can imagine a digital signatures vocabulary providing a way to formally state something like: "if user U signed digital contract C, then it means that U has agreed to the terms of contract C". But I do not yet see why it would need to say anything about the format or language of C. C just is whatever it is, whether it's English, RDF or something else. Can someone enlighten me on this point? Those are my high level comments and question. Further explanation about the diff/patch use case follows. ----------------------------------- Diff/Patch Use Case: The key consideration that the diff/patch use case adds to canonicalization is that a "small" change to an RDF dataset should produce a commensurately "small" change in the canonicalized result (to the extent possible), at least for common use cases, such as adding/deleting a few triples, adding/deleting an RDF molecule/object (or a concise bounded description https://www.w3.org/Submission/2004/SUBM-CBD-20040930/ or similar), adding/deleting a graph from an RDF dataset, adding/deleting list elements, or adding/deleting a level of hierarchy in a tree (or tree-ish graph). This requirement is not important for digital signature use cases, but it is essential for diff/patch use cases. And to be clear, this requirement ONLY applies to the canonicalization algorithm -- NOT the hashing algorithm. Indeed, a cryptographic hashing algorithm must have exactly the opposite property: a small change in the input must produce a LARGE (random) change in the output. If the proposed canonicalization algorithms already meet this requirement, then that would be great. But I am not aware of any testing that has been done on them with this use case in mind, to find out whether they would meet this requirement. And I do think this requirement is important for a general purpose canonicalization standard. Proposed text for use cases: [[ Diff/patch of RDF Datasets For diff/patch applications that need to track changes to RDF datasets, or keep two RDF datasets in sync by applying incremental changes, the N-Quads Canonicalization algorithm should make best efforts to produce results that are well suited for use with existing line-oriented diff and patch tools. This means that, given a "small" change to an RDF dataset -- i.e., changing only a "small" number of lines in an N-Quads representation -- the N-Quads Canonicalization Algorithm will most likely produce a commensurately "small" change in its canonicalized result, for common diff/patch use cases. Common use cases include adding/deleting a few triples, adding/deleting an RDF molecule/object (or a concise bounded description https://www.w3.org/Submission/2004/SUBM-CBD-20040930/ or similar), adding/deleting a graph from an RDF dataset, adding/deleting list elements, or adding/deleting a level of hierarchy in a tree (or tree-ish graph) Requirement: A Diff-Friendly N-Quads Canonicalization Algorithm ]] Thanks, David Booth On 4/6/21 6:20 AM, Ivan Herman wrote: > Dear all, > > the W3C has started to work on a Working Group charter for Linked Data > Signatures: > > https://w3c.github.io/lds-wg-charter/index.html > <https://w3c.github.io/lds-wg-charter/index.html> > > The work proposed in this Working Group includes Linked Data > Canonicalization, as well as algorithms and vocabularies for encoding > digital proofs, such as digital signatures, and with that secure > information expressed in serializations such as JSON-LD, TriG, and N-Quads. > > The need for Linked Data canonicalization, digest, or signature has been > known for a very long time, but it is only in recent years that research > and development has resulted in mathematical algorithms and related > implementations that are on the maturity level for a Web Standard. A > separate explainer document: > > https://w3c.github.io/lds-wg-charter/explainer.html > <https://w3c.github.io/lds-wg-charter/explainer.html> > > provides some background, as well as a small set of use cases. > > The W3C Credentials Community Group[1,2] has been instrumental in the > work leading to this charter proposal, not the least due to its work on > Verifiable Credentials and with recent applications and development on, > e.g., vaccination passports using those technologies. > > It must be emphasized, however, that this work is not bound to a > specific application area or serialization. There are numerous use cases > in Linked Data, like the publication of biological and pharmaceutical > data, consumption of mission critical RDF vocabularies, and others, that > depend on the ability to verify the authenticity and integrity of the > data being consumed. This Working Group aims at covering all those, and > we hope to involve the Linked Data Community at large in the elaboration > of the final charter proposal. > > We welcome your general expressions of interest and support. If you wish > to make your comments public, please use GitHub issues: > > https://github.com/w3c/lds-wg-charter/issues > <https://github.com/w3c/lds-wg-charter/issues> > > A formal W3C Advisory Committee Review for this charter is expected in > about six weeks. > > [1] https://www.w3.org/community/credentials/ > <https://www.w3.org/community/credentials/> > [2] https://w3c-ccg.github.io/ <https://w3c-ccg.github.io/> > > > ---- > Ivan Herman, W3C > Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/> > mobile: +33 6 52 46 00 43 > ORCID ID: https://orcid.org/0000-0003-0782-2704 > <https://orcid.org/0000-0003-0782-2704> >
Received on Monday, 7 June 2021 21:36:27 UTC