- From: David Booth <david@dbooth.org>
- Date: Fri, 11 Jun 2021 17:14:50 -0400
- To: Eric Prud'hommeaux <eric@w3.org>, Dan Brickley <danbri@danbri.org>
- Cc: semantic-web@w3.org
On 6/11/21 9:30 AM, Eric Prud'hommeaux wrote: > On Fri, Jun 11, 2021 at 10:08:56AM +0100, Dan Brickley wrote: >> . . . >> Should protein databank files be RDFized before they fall in scope of this >> new WGs mission? >> https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format) - and if so, >> why? > > RDF Signatures are for signing RDF structures. Without such a > mechanism, you have to sign the syntax of an RDF document, which means > you have to keep it around, serve it preferentially whenever anyone > asks for a particular graph. That's a biggish ask of a quad store. It > would also involve inventing some protocol to say "please dig up the > original serialization" and probably some other convention. In the > end, it would be brittle and most folks would consider it a crappy hack. I think that is excellent justification (among other good reasons) for standardizing a canonical N-Quads format, and I fully support a W3C effort to do that. But I think at least one part of the confusion and concern is that the proposed canonicalization is framed as an *abstract* RDF Dataset canonicalization. I think that framing is causing two problems: 1. It creates the *perception* of a greatly increased attack surface, from a security standpoint, because it bundles the canonicalization algorithm with the cryptographic hash generation step, and claims to produce a hash of the *abstract* RDF Dataset. In reality, it does no such thing: the hash is computed on a concrete, canonicalized N-Quads serialization. But it is understandable that people would look at it and worry about what new security vulnerabilities it might create, given this framing. 2. It is misleading, because *any* RDF canonicalization algorithm is fundamentally about serialization -- *not* the abstract RDF Dataset. The proposed algorithm is really only abstract in the sense that it can be used with a family of serializations. I suggest cleanly separating the N-Quads canonicalization work from the digital signature vocabulary work (as DanBri also suggested), and reframing the canonicalization work as specifically N-Quads canonicalization, with a spin-off benefit that the bnode canonicalization algorithm can be applied to other serializations also, if desired. Thanks, David Booth
Received on Friday, 11 June 2021 21:15:45 UTC