Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Aidan Hogan on 2021-05-25 (semantic-web@w3.org from May 2021)

From: Aidan Hogan <aidhog@gmail.com>
Date: Tue, 25 May 2021 18:42:55 -0400
To: Dan Brickley <danbri@danbri.org>
Cc: semantic-web@w3.org
Message-ID: <fa44f183-361a-657c-5e3e-e2fd365b1f58@gmail.com>
On 2021-05-24 2:50, Dan Brickley wrote:
> 
> 
> On Mon, 24 May 2021 at 07:10, Aidan Hogan <aidhog@gmail.com 
> <mailto:aidhog@gmail.com>> wrote:
> 
>     On 2021-05-23 7:06, Peter F. Patel-Schneider wrote:
>      > So it appears that you agree with me that signing a document
>     serializing
>      > an RDF dataset according using the algorithms in Linked Data
>     Proofs 1.0
>      > do not meet the usual computer security requirements.
> 
>     Hmm, I'm not sure. I guess it depends on the specific security
>     requirements (and I'm not an expert on such topics).
> 
>     However, I think most of the issues you mention might lead to a verify
>     returning *false* "unexpectedly" because while verifying, you extract a
>     dataset (or signature parameters) different from the original graph (or
>     signature parameters) passed to the sign function. This type of false
>     negative seems to affect something more akin to "usability" rather than
>     security: it seems to me to err on the side of caution.
> 
>     If verify were returning *true* unexpectedly, I would have to imagine
>     that that would be more worrying in terms of security requirements, but
>     I don't think such issues are likely as they would seemingly break some
>     of the guarantees of the underlying cryptography (used in sign).
> 
> 
> semantic-web@w3.org <mailto:semantic-web@w3.org> is the wrong forum to 
> be conducting a security review of this “design input”.

I agree, but feel it important to clarify that the discussion on this 
list -- even if it relates to security -- is not a problem. The problem 
is if the discussion on this list is perceived as a "security review".

I think the discussion here can be fruitful for several reasons, even if 
we are not security experts. Some thoughts along those lines:

- A proof is a proof. It does not matter if a proof was provided by a 
security expert with 30 years of professional expertise, or a 9 year old 
schoolkid with a purple crayon. If the proof is good it will concisely 
convince whoever takes the time to understand it. I would rather a good 
proof that I can understand than the word of a world-leading security 
expert (aka. an appeal to authority).

- The issues being discussed here relate more to the definitions of RDF 
datasets -- how they are serialised or deserialised, how signature 
metadata can be embedded and extracted, etc. -- than cryptography.

- We are not inventing new cryptography. Rather we are black-boxing 
cryptography. The complicated stuff in terms of how the cryptography is 
implemented and what guarantees this provides is done for us. The issue 
then is to take those guarantees and reduce the current RDFy 
specifications to them. For example, the cryptography black box 
presumably will provide guarantees with respect to the difficulty of a 
pre-image attack. Assuming this guarantee (without necessarily 
understanding how it is implemented), one can then formalise guarantees 
regarding the difficulty of attacks on an RDF level that equate to a 
pre-image attack (e.g., using proof by contradiction, showing that the 
RDF-level attack would constitute a pre-image attack). I think that this 
sort of task requires a more detailed understanding of RDF than of 
cryptography.

This is not to say that the input of security experts would not be 
useful or anything of the sort. In particular, it would be helpful to 
have such advice in order to understand what guarantees the 
cryptographic black box provides (under what conditions or nuances), and 
what guarantees are necessary to avoid what types of attack. But it does 
not diminish the value of discussion on this list.

(Ceci n'est pas un audit de sécurité.)

Best,
Aidan

> https://www.w3.org/TR/xmldsig-bestpractices/ 
> <https://www.w3.org/TR/xmldsig-bestpractices/> gives an example of the 
> variety of things that can go wrong
> 
> While verification unexpectedly succeeding would be woeful, not that 
> many of the issues listed with XML Signature (oddly not a WG “input”) 
> relate to the use of fancy (XSLT) data transformations, including denial 
> of service attacks.
> 
> Perhaps W3C should hold a Workshop on next steps in this area. Or at 
> least solicit wider review of the draft charter...
> 
> Dan
> 
> 
> 
> 
> 
> 
>      > You also appear
>      > to be saying that it might be possible to come up with
>     qualifications
>      > that could fix this problem.
> 
>     Yes, I suspect that technically it should not be difficult. I think the
>     harder part will be to reach a consensus on what qualifications to
>     apply
>     in order to ensure G1 and G2 mentioned previously.
> 
>     Best,
>     Aidan
> 
>      >
>      >
>      > On 5/22/21 8:54 PM, Aidan Hogan wrote:
>      >  > Hi Peter,
>      >  >
>      >  > Responding below to some of the technical issues:
>      >  >
>      > [...]
>      >
>      >  >> Technical Details:
>      >  >>
>      >  >> I take the method to sign and verify RDF datasets to be as
>     follows:
>      >  >>
>      >  >> sign(document, private key, identity)
>      >  >>    let D be the RDF dataset serialized in document
>      >  >>    let C be the canonicalized version of D
>      >  >>    let S be triples representing a signature of C using
>     private key
>      >  >>    let signed document be document plus a serialization of S,
>      >  >>      so signed document serializes D union (not merge) S
>      >  >>    return signed document
>      >  >>
>      >  >> verify(signed document)
>      >  >>    let D' be the RDF dataset serialized in signed document
>      >  >>    let S be the signature in D'
>      >  >>    let D be D' - S
>      >  >>    let C be the canonicalized version of D
>      >  >>    return whether S is a valid signature for C
>      >  >>
>      >  >> To my non-expert eye there are several significant problems here.
>      >  >> 1/ The signature extracted from the signed document might be
>      > different from the signature used to sign the original document
>     if the
>      > original document has signatures in it.
>      >  >> 2/ The dataset extracted during verification might not be the
>      > dataset used during signing because
>      >  >> the original document if the original document has signatures
>     in it.
>      >  >> 3/ Adding extra information after signing might be possible
>     without
>      > affecting verification if the extra information looks like a
>     signature.
>      >  >
>      >  > I agree, but I guess that such issues could be solved by
>     (possibly
>      > some combination of -- as a sketch):
>      >  >
>      >  > 1) Forbidding signing of datasets with signatures
>      >  >  + simplifies signing and verifying
>      >  >  - breaks use-cases involving signing signed datasets
>      >  >
>      >  > 2) Specify S as a separate argument in verify
>      >  >  + can sign any RDF dataset, including signed RDF datasets
>      >  >  - makes arguments for verify more verbose
>      >  >
>      >  > 3) Creating a structure that indicates a set or chain of
>     signatures
>      > in D', per https://w3c-ccg.github.io/ld-proofs/#multiple-proofs
>     <https://w3c-ccg.github.io/ld-proofs/#multiple-proofs>. In the
>      > case of a proof set, the entire proof set is removed from D'
>     prior to
>      > verification and each proof within it must be checked in verify (for
>      > example). In the case of a proof chain, only the last element of the
>      > chain (which I guess might be a proof set?) is removed from D'
>     prior to
>      > verification and used for verification.
>      >  >  + keeps calls to verify simple, can sign any RDF dataset
>      >  >  - might add complexity (e.g, requiring careful validation rules,
>      > defining what is the "last element" of a chain, etc.)
>      >  >
>      >  > In each case, there will probably be the need for a definition of
>      > valid signed RDF datasets (values for D'), valid signature
>     descriptions
>      > (values for S), etc., with invalid values being rejected by
>     verify. The
>      > purpose would be to guarantee that, for the verify process:
>      >  >
>      >  > G1: that an abstract RDF dataset D to be verified can be
>      > unambiguously extracted from D' (modulo isomorphism), and D' alone
>      >  >
>      >  > G2: that the parameters needed to verify the signature of D
>     can be
>      > unambiguously identified from D' (and other arguments given) alone
>      >  >
>      >  > I think that with these two guarantees, the correctness of the
>      > process can be reduced to the correctness of the RDF dataset
>      > canonicalisation process and the digital signature scheme used.
>      >  >
>      >  > The precise restrictions o get to these guarantees would
>     depend on
>      > the solution, but in the case of (3) currently proposed by the
>     Linked
>      > Data Proofs 1.0 document, they might be along the lines of:
>      >  >
>      >  > - Disallowing multiple proof sets at the same "level".
>      >  > - Disallowing "branching" chains of proofs.
>      >  >
>      >  > etc.
>      >  >
>      >  >> 4/ The dataset extracted during verification might not be the
>      > dataset used during signing because the original document has
>     relative
>      > IRIs.
>      >  >
>      >  > Breaks G1. I guess this issue is something that arguably
>     "transcends"
>      > the proposed process. The canonicalisation function would be
>     defined in
>      > terms of the abstract RDF dataset with absolute IRIs. It seems
>     that this
>      > issue of not having an explicit base IRI affects all of the RDF
>     stack in
>      > a similar way in that it affects the translation of a sequence of
>     bytes
>      > in some RDF syntaxes into an abstract RDF dataset. But it would need
>      > review at some point regarding how it affects G1, I agree.
>      >  >
>      >  > A possible solution would be to enforce unambiguous (relative)
>     IRIs
>      > in the serialisations of signed documents that reflect the
>     absolute IRIs
>      > used during the canonicalisation process (and reject documents
>     passed to
>      > verify that do not satisfy G1 for this reason).
>      >  >
>      >  >> 5/ The dataset extracted during verification might not be the
>      > dataset used during signing because the original document is in a
>      > serialization that uses external resources to generate the
>     dataset (like
>      > @context in JSON-LD) and this external resource may have changed.
>      >  >
>      >  > I guess this is a similar issue to 4 and breaks G1, and so would
>      > require some restrictions to ensure that the dataset to verify
>     can be
>      > extracted from D' (and maybe the other arguments to verify) *alone*.
>      >  >
>      >  >> 6/ Only the serialized dataset is signed so changing comments in
>      > serializations that allow comments or other parts of the document
>     that
>      > do not encode triples or quads results can be done without
>     affecting the
>      > validity of the signature.  This is particularly problematic for
>     RDFa.
>      >  >
>      >  > True. I think though so long as this is made clear, it would
>     not be a
>      > problem, per se, but rather something to highlight.
>      >  >
>      >  > Best,
>      >  > Aidan
>      >  >
>      >
>
Received on Tuesday, 25 May 2021 22:43:11 UTC