Re: Thoughts on the LDS WG chartering discussion from David Booth on 2021-06-11 (semantic-web@w3.org from June 2021)

From: David Booth <david@dbooth.org>
Date: Fri, 11 Jun 2021 17:14:50 -0400
To: Eric Prud'hommeaux <eric@w3.org>, Dan Brickley <danbri@danbri.org>
Cc: semantic-web@w3.org
Message-ID: <d087b8da-7127-6330-dfa9-aa13ac3f7444@dbooth.org>

On 6/11/21 9:30 AM, Eric Prud'hommeaux wrote:
> On Fri, Jun 11, 2021 at 10:08:56AM +0100, Dan Brickley wrote:
>> . . .
>> Should protein databank files be RDFized before they fall in scope of this
>> new WGs mission?
>> https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format) - and if so,
>> why?
> 
> RDF Signatures are for signing RDF structures. Without such a
> mechanism, you have to sign the syntax of an RDF document, which means
> you have to keep it around, serve it preferentially whenever anyone
> asks for a particular graph. That's a biggish ask of a quad store. It
> would also involve inventing some protocol to say "please dig up the
> original serialization" and probably some other convention. In the
> end, it would be brittle and most folks would consider it a crappy hack.

I think that is excellent justification (among other good reasons) for 
standardizing a canonical N-Quads format, and I fully support a W3C 
effort to do that.  But I think at least one part of the confusion and 
concern is that the proposed canonicalization is framed as an *abstract* 
RDF Dataset canonicalization.  I think that framing is causing two problems:

1. It creates the *perception* of a greatly increased attack surface, 
from a security standpoint, because it bundles the canonicalization 
algorithm with the cryptographic hash generation step, and claims to 
produce a hash of the *abstract* RDF Dataset.  In reality, it does no 
such thing: the hash is computed on a concrete, canonicalized N-Quads 
serialization.  But it is understandable that people would look at it 
and worry about what new security vulnerabilities it might create, given 
this framing.

2. It is misleading, because *any* RDF canonicalization algorithm is 
fundamentally about serialization -- *not* the abstract RDF Dataset. 
The proposed algorithm is really only abstract in the sense that it can 
be used with a family of serializations.

I suggest cleanly separating the N-Quads canonicalization work from the 
digital signature vocabulary work (as DanBri also suggested), and 
reframing the canonicalization work as specifically N-Quads 
canonicalization, with a spin-off benefit that the bnode 
canonicalization algorithm can be applied to other serializations also, 
if desired.

Thanks,
David Booth

Received on Friday, 11 June 2021 21:15:45 UTC