Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Henry Story on 2021-05-23 (semantic-web@w3.org from May 2021)

From: Henry Story <henry.story@gmail.com>
Date: Sun, 23 May 2021 15:49:21 +0200
To: Aidan Hogan <aidhog@gmail.com>
Cc: semantic-web@w3.org
Message-Id: <CDA840F3-F040-4668-B61B-BB7B2C5B3022@gmail.com>
> On 23. May 2021, at 02:54, Aidan Hogan <aidhog@gmail.com> wrote:
> 
> Hi Peter,
> 
> Responding below to some of the technical issues:
> 
> On 2021-05-22 8:30, Peter F. Patel-Schneider wrote:
>> Hi Ivan:
>> As you should have suspected I have a very different take on this.
>> Sure any WG can take inputs and work on them.   But my, admittedly non-expert, view here is that the major input has significant flaws, and in computer security any flaw is fatal.  I've pointed out one but I think there are others.  (See below.)
>> I am in favour of W3C providing some way of securely transmitting RDF graphs and datasets.  Of course there already is a way of doing this by simply treating the serialization of the graph or dataset as a text document and transmitting that document bundled with its signature, much the same way that emails are signed.  The goal is to do something better.
>> My worry is that going through AC review with the proposed charter using Linked Data Proofs 1.0 as its major support will result in the working group being turned down because of flaws in Linked Data Proofs 1.0.
>> I would greatly appreciate a discussion of the possible flaws in that document.  This discussion does not appear to be happening, which I find worrisome.
>> peter
>> Technical Details:
>> I take the method to sign and verify RDF datasets to be as follows:
>> sign(document, private key, identity)
>>   let D be the RDF dataset serialized in document
>>   let C be the canonicalized version of D
>>   let S be triples representing a signature of C using private key
>>   let signed document be document plus a serialization of S,
>>     so signed document serializes D union (not merge) S
>>   return signed document
>> verify(signed document)
>>   let D' be the RDF dataset serialized in signed document
>>   let S be the signature in D'
>>   let D be D' - S
>>   let C be the canonicalized version of D
>>   return whether S is a valid signature for C
>> 
>> To my non-expert eye there are several significant problems here.
>> 1/ The signature extracted from the signed document might be different from the signature used to sign the original document if the original document has signatures in it.
>> 2/ The dataset extracted during verification might not be the dataset used during signing because
>> the original document if the original document has signatures in it.
>> 3/ Adding extra information after signing might be possible without affecting verification if the extra information looks like a signature.
> 
> I agree, but I guess that such issues could be solved by (possibly some combination of -- as a sketch):
> 
> 1) Forbidding signing of datasets with signatures
> + simplifies signing and verifying
> - breaks use-cases involving signing signed datasets
> 
> 2) Specify S as a separate argument in verify
> + can sign any RDF dataset, including signed RDF datasets
> - makes arguments for verify more verbose
> 
> 3) Creating a structure that indicates a set or chain of signatures in D', per https://w3c-ccg.github.io/ld-proofs/#multiple-proofs. In the case of a proof set, the entire proof set is removed from D' prior to verification and each proof within it must be checked in verify (for example). In the case of a proof chain, only the last element of the chain (which I guess might be a proof set?) is removed from D' prior to verification and used for verification.
> + keeps calls to verify simple, can sign any RDF dataset
> - might add complexity (e.g, requiring careful validation rules, defining what is the "last element" of a chain, etc.)
> 
> In each case, there will probably be the need for a definition of valid signed RDF datasets (values for D'), valid signature descriptions (values for S), etc., with invalid values being rejected by verify. The purpose would be to guarantee that, for the verify process:
> 
> G1: that an abstract RDF dataset D to be verified can be unambiguously extracted from D' (modulo isomorphism), and D' alone
> 
> G2: that the parameters needed to verify the signature of D can be unambiguously identified from D' (and other arguments given) alone
> 
> I think that with these two guarantees, the correctness of the process can be reduced to the correctness of the RDF dataset canonicalisation process and the digital signature scheme used.
> 
> The precise restrictions o get to these guarantees would depend on the solution, but in the case of (3) currently proposed by the Linked Data Proofs 1.0 document, they might be along the lines of:
> 
> - Disallowing multiple proof sets at the same "level".
> - Disallowing "branching" chains of proofs.

The right thing to do here would be I believe:

1. to disallow signatures to cover the graph in which they are located.
   the signature should always be external to the graph
2. If using DataSets the signature then has to be in the default graph
   and sign one of the embedded graphs. Or perhaps there can be child graphs signing
   their siblings?
3. This can then be generalized in two ways to any number of signatures
   of signatures by:
     - the technically light weight answer of having graph datatypes as proposed by
      Antoine Zimmermann
           https://lists.w3.org/Archives/Public/semantic-web/2021May/0052.html
     - putting domr more effort into standardizing N3 which allows
      recursive graphs within graphs.  See message on n3 list:
           https://lists.w3.org/Archives/Public/public-n3-dev/2021May/0012.html

> 
> etc.
> 
>> 4/ The dataset extracted during verification might not be the dataset used during signing because the original document has relative IRIs.
> 
> Breaks G1. I guess this issue is something that arguably "transcends" the proposed process. The canonicalisation function would be defined in terms of the abstract RDF dataset with absolute IRIs. It seems that this issue of not having an explicit base IRI affects all of the RDF stack in a similar way in that it affects the translation of a sequence of bytes in some RDF syntaxes into an abstract RDF dataset. But it would need review at some point regarding how it affects G1, I agree.
> 
> A possible solution would be to enforce unambiguous (relative) IRIs in the serialisations of signed documents that reflect the absolute IRIs used during the canonicalisation process (and reject documents passed to verify that do not satisfy G1 for this reason).
> 
>> 5/ The dataset extracted during verification might not be the dataset used during signing because the original document is in a serialization that uses external resources to generate the dataset (like @context in JSON-LD) and this external resource may have changed.
> 
> I guess this is a similar issue to 4 and breaks G1, and so would require some restrictions to ensure that the dataset to verify can be extracted from D' (and maybe the other arguments to verify) *alone*.
> 
>> 6/ Only the serialized dataset is signed so changing comments in serializations that allow comments or other parts of the document that do not encode triples or quads results can be done without affecting the validity of the signature.  This is particularly problematic for RDFa.
> 
> True. I think though so long as this is made clear, it would not be a problem, per se, but rather something to highlight.
> 
> Best,
> Aidan
> 
>> I welcome discussion of these points and am open to being proven wrong on them..
>> On 5/22/21 6:43 AM, Ivan Herman wrote:
>>> Peter,
>>> 
>>> I agree that these are issues to handle/settle in a final specification. And I let Manu reply to the specifics.
>>> 
>>> However, I would regard these to be done during the life time of the Working Group, if it gets approved; after all, making sure of these required quality checks are one of the strong points of the W3C Process. The Linked Data Proof draft specification is not even the FPWD of the WG's deliverables, it is just a referenced document.
>>> 
>>> Thanks
>>> 
>>> Ivan
>>> 
>
Received on Sunday, 23 May 2021 13:49:39 UTC