Re: Signing and Verifying RDF Datasets for Dummies (like Me!) from Eric Prud'hommeaux on 2021-06-09 (semantic-web@w3.org from June 2021)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 9 Jun 2021 13:10:28 +0200
To: Peter Patel-Schneider <pfpschneider@gmail.com>
Cc: semantic-web@w3.org
Message-ID: <20210609111028.GA6976@w3.org>
On Tue, Jun 08, 2021 at 08:21:31PM -0400, Peter Patel-Schneider wrote:
> On Tue, 2021-06-08 at 23:13 +0200, Eric Prud'hommeaux wrote:
> > On Mon, Jun 07, 2021 at 08:31:17PM -0400, Peter F. Patel-Schneider
> wrote:
> > > On Mon, 2021-06-07 at 22:49 +0200, Eric Prud'hommeaux wrote:
> > > > On Mon, Jun 07, 2021 at 03:37:44PM -0400, Peter Patel-Schneider
> > > > wrote:
> > > 
> > > [..]
> > > 
> > > > A third related to changing the meaning of JSON-LD documents by
> > > > changing the @context. This isn't related to signatures, and if
> > > > anything, signatures give you a tool to prevent that because
> you've
> > > > signed a the resulting document and if someone changes the the
> > > > @context under you, you can't verify the signature.
> > > >
> > > > Those were, afaict, the only substantial critiques. Most were of
> the
> > > > form "if you change X, the hash changes and the signature breaks"
> to
> > > > which the reply is "by design".
> > > 
> > > Remote contexts are indeed problematic for JSON-LD documents.  They
> can
> > > cause failures in both directions.  If the remote context is
> changed the
> > > deserialization of the document may change, invalidating signatures
> of
> > > documents that use the remote context.  But I believe that
> attackers can
> > > also use remote contexts to change signed JSON-LD documents in a
> way that
> > > validation by recipients will succeed but when the recipient
> deserializes
> > > the document they end up with an RDF dataset that is not isomorphic
> to the
> > > dataset signed by the originator.  I believe that this is the case
> even if
> > > the orignal signed JSON-LD document did not use remote contexts.
> > 
> > Do you agree that in order to do so, they'd have to expand the
> > document more than once, and carry the conclusion of a valid
> signature
> > over from the first expansion?
> 
> For the receiver seeing a different graph than what the sender signed
> this double expansion is needed.  However, double expansion is required
> given the algorithms in https://w3c-ccg.github.io/ld-proofs/

I see only one expansion in signing and one in verification. Together,
they total two, but any change to the context that affects the signed
triples with result in a failed verification. The only place to inject
a well-timed context change would be if either signing or verification
demanded multiple expansions.

First let's look at proof generation; the input to step 2 is a
dataset, so the input has already been turned from JSON-LD into
RDF. Here's the signature algorithm:

[[
|| 1 Create a copy of document, hereafter referred to as output. 
|| 2 Generate a canonicalized document by canonicalizing document according
||  to a canonicalization algorithm (e.g. the URDNA2015
||  [[!RDF-DATASET-NORMALIZATION]] algorithm). 

(I believe this should point to
<https://json-ld.github.io/rdf-dataset-canonicalization/spec/#canonicalization-algorithm>;
there's a broken link in the refs.)

|| 3 Create a value tbs that represents the data to be signed, and set it to the
||  result of running the Create Verify Hash Algorithm, passing the information
||  in options.

This is expanded below. It's basically:

1. canonicalize a proof graph like the second textarea in my RDF
   Siganture Example, under the heading "with proof node [] † in:"
2. concatonate the canoninicalized doc to be signed (my top textarea).
3. Use conventional crypto tools to hash and sign that concatenation.

|| 4 Digitally sign tbs using the privateKey and the the digital proof algorithm
||  (e.g. JSON Web Proof using RSASSA-PKCS1-v1_5 algorithm). The resulting
||  string is the proofValue. 
|| 5 Add a proof node to output containing a linked data proof using the
||  appropriate type and proofValue values as well as all of the data in the
||  proof options (e.g. created, and if given, any additional proof options such
||  as domain). 

JSON-LD makes this step slightly more complex. In principle, you're
building a new graph with the orig doc and the proof with some
property (`jws` or `proofValue` in my examples) with the signature
("proofValue" from step 4).

If you started with, and want to emit, nicely-framed JSON-LD, you can
append the sec: context to the doc's the context and add this property
directly to your root object. Appending that context estabilish
primacy for the definition of the properties used in the proof graph
(though probably a tree).

Signing and verification implementations should include
<https://w3id.org/security/suites/ed25519-2020/v1> in the distribution
so they aren't vulnerable to can-in-the-middle or takeover by a
totalitarian regime. This is why Manu's documentResolver is restricted
to local files.

|| 6 Return output as the signed linked data document.
]]

Proof Verification Algorithm:
[[
|| 1 Get the public key by dereferencing its URL identifier in the proof node of
||  the default graph of signed document. Confirm that the linked data
||  document that describes the public key specifies its owner and that its
||  owner's URL identifier can be dereferenced to reveal a bi-directional link
||  back to the key. Ensure that the key's owner is a trusted entity before
||  proceeding to the next step.

That last sentence is crucial. Anyone can create a key pair to sign
anything. They can replace the proof in a signed document with their
own proof. This is the same as addding an entirely new signature to a
copy of the signed document.

|| 2 Let document be a copy of signed document.
|| 3 Remove any proof nodes from the default graph in document and save it
||  as proof. 

Here again, JSON-LD makes this slightly more complex. Manu's code
doesn't remove the proof node from the graph but instead removes
properties from a JSON-LD document which effectively eliminates the
proof node. This again counts on the presence of the sec: context at
the end of the doc's @context.

|| 4 Generate a canonicalized document by canonicalizing document according
||  to the canonicalization algorithm (e.g. the URDNA2015
||  [[!RDF-DATASET-NORMALIZATION]] algorithm). 
|| 5 Create a value tbv that represents the data to be verified, and set it to the
||  result of running the Create Verify Hash Algorithm, passing the information
||  in proof. 
|| 6 Pass the proofValue, tbv, and the public key to the proof algorithm (e.g.
||  JSON Web Proof using RSASSA-PKCS1-v1_5 algorithm). Return the resulting
||  boolean value. 
]]

Here's the "Create Verify Hash Algorithm", in case you want to point
to a part of it and say "here's a second expansion".

[[
|| 1 Let options be a copy of input options. 
|| 2 If the proofValue parameter, such as jws, exists in options, remove the
||  entry. 
|| 3 If created does not exist in options, add an entry with a value that is an
||  [[!ISO8601]] combined date and time string containing the current date and
||  time accurate to at least one second, in Universal Time Code format. For
||  example: 2017-11-13T20:21:34Z. 
|| 4 Generate output by: 
|| 
||  1 Creating a canonicalized options document by canonicalizing options
||  according to the canonicalization algorithm (e.g. the URDNA2015
||  [[!RDF-DATASET-NORMALIZATION]] algorithm). 
||  2 Hash canonicalized options document using the message digest
||  algorithm (e.g. SHA-256) and set output to the result. 
||  3 Hash canonicalized document using the message digest algorithm (e.g.
||  SHA-256) and append it to output. 
|| 
|| 5 This last step needs further clarification. Signing implementations usually
||  automatically perform their own integrated hashing of an input message,
||  i.e. signing algorithms are a combination of a raw signing mechanism and a
||  hashing mechanism such as RS256 (RSA + SHA-256). Current
||  implementations of RSA-based Linked Data Proof suites therefore do not
||  perform this last step before passing the data to a signing algorithm as it
||  will be performed internally. The Ed25519Proof2018 algorithm also does
||  not perform this last step -- and, in fact, uses SHA-512 internally. In short,
||  this last step should better communicate that the 64 bytes produced from
||  concatenating the SHA-256 of the canonicalized options with the SHA-256
||  of the canonicalized document are passed into the signing algorithm with a
||  presumption that the signing algorithm will include hashing of its own.
||  Note: It is presumed that the 64-byte output will be used in a signing
||  algorithm that includes its own hashing algorithm, such as RS256 (RSA +
||  SHA-256) or EdDsa (Ed25519 which uses SHA-512). 
|| 6 Return output. 
]]


> > The main thing RDF signatures is doing is canonicalizing and hashing
> > pairs of a document and a proof. The hashing technology is standard
> > fair used in lots of tech today. The canonicalization could only be
> > attacked if it produced the same result from different, non-
> isomorphic
> > graphs. The focus here seems to be based on tricking people into
> > signing something different from what they thought they were signing,
> > or presenting data different from what was signed.
> > 
> > I don't think one could say these are different in kind from either:
> > 1. any other use of JSON-LD
> > 2. the use of any tech where a remote doc tweaks the semantics (DTD).
> 
> Indeed several of the problems I have pointed out involve manipulating
> the environment so that the receiver ends up with an RDF dataset that
> is different from what was verified.  This is somewhat similar to the
> issues you point out, but having the problem affect cryptographic
> signatures raises it to a much higher level.  Consider, for example, if
> what was signed was not a separate representation of the dataset but
> the actual JSON-LD document itself.   This would not get you canonical
> signatures but the signatures would be verifiable even if the remote
> resources changed or the document used relative IRIs that ended up
> being based on the document's location.

These algorithms don't work over a JSON-LD doc itself. Using e.g. JSON
Web Signatures (rfc7515) to sign a JSON-LD docuemnt, or even PGP to
sign that document, would indeed invite the attack of changing the
@context. JWS signs the document; RDF Signatures signs the expansion
of that document. RDF Signatures are effectively a way to prevent the
attacks you describe.

(BTW, the jws property in the sec: namespace doesn't refer to a JSW
signature of a JSON-LD input documet, but instead the JWS if a dinky
little tree used to capture some parms for a signature. Don't be
mislead.)


> [...]
> 
> 
> > > The Web GUI you put up at
> > > https://janeirodigital.github.io/rdf-sig-playground/index was
> useful but it
> > > doesn't take JSON-LD and appears to produce quite different output.
> > 
> > The default manifest file loads an example which creates a VC. RDF
> > Signatures are, as indicated in the proposed charter, a framework for
> > creating protocols like VCs and like what Manu signed. I stuck a
> > <select/> at the top to make it generate a proof like Manu's.
> > 
> > With this manifest:
> > 
> >
> https://janeirodigital.github.io/rdf-sig-playground/index?manifestURL=examples/toy.yaml
> > 
> > you should be able to reproduce (and step through) Manu's example
> > without ever dipping into JSON-LD. It doesn't accept his key pair
> > because key formats are beyond my ken (Manu, a PR would be welcomed).
> > In principle, if it did, you'd see a proofValue of
> > 
> >
> 'z4oey5q2M3XKaxup3tmzN4DRFTLVqpLMweBrSxMY2xHX5XTYVQeVbY8nQAVHMrXFkXJpmE
> cqdoDwLWxaqA3Q1geV6'
> > instead of my
> >
> 'z5BK4yiC7Ee85EFjDYG3qSnRGrW7DrcmmMaJwEULMJknAN7ZmxTCcGZVthe71UMKreKaKb
> Vx9rBWV3BkiWhxNpCXp'
> 
> One of the reasons to step into JSON-LD is to examine the problems
> caused by the use of JSON-LD.
> 
> Is it the case that your system is supposed to conform to the
> algorithms in https://w3c-ccg.github.io/ld-proofs/ with the proof
> triples corresponding to the proof options?  I guess I can see this but
> it does seem odd to allow any triples in the signature block.

You can invent signature conventions to match whatever use cases you
have; the only triple you have to remove before canonicalization for
verification is the one with the actual signature. These attributes
are analogous to what's called in JWS the "protected header" because
it's metadata that's included in the signature hash. Try emptying out
the proof graph in the RDF Signature playground; it should continue to
protect the integrity of the signed doc. The Create Verify Hash
Algorithm specifies that `created` appear in the protected headers
(called "input options" in that section of the spec). That seems a bit
aribtrary without an expiry, but that's what WGs are good for.

JWS has a notion of unprotected headers as well, i.e. metadata that is
not incorporated into the signed hash. I don't have an input field for
that in the playground (and won't unless some good soul undertakes to
make a cool UI). The lack of unprotected headers means everything ends
of protected. That's probably fine, but something for a WG to consider.


> When I use the output of your implementation as the input to it, i.e.,
> I try to sign an RDF graph that contains a signature.  I get a failure
> as I expected but not for the reason I expected.  Your implementation
> appears to have some explicit check for multiple signatures but I don't
> see why in principle multiple signatures are not allowed.  The
> algorithms in https://w3c-ccg.github.io/ld-proofs/ don't have
> exclusions for RDF datasets that already contain signatures.

I read the input spec as saying I MUST remove any sec:proofs. I didn't
do that (wanted to leave room to play around) so you can get e.g.

[[
@prefix cred: <https://www.w3.org/2018/credentials#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix scorg: <https://schema.org#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sec: <https://w3id.org/security#> .

_:doc1 scorg:title "Hello world!" .
_:doc1 sec:proof _:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 .
_:doc1 sec:proof _:_2282589f_18af_4cb5_8e52_2c268f7608a9 .
_:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 rdf:type sec:Ed25519Signature2020 .
_:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 dc:created "2021-05-29T19:23:24Z"^^xsd:dateTime .
_:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 sec:proofPurpose sec:assertionMethod .
_:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 sec:verificationMethod <https://pfps.example/issuer#z6MkjLrk3gKS2nnkeWcmcxiZPGskmesDpuwRBorgHxUXfxnG> .
_:_2d29c99a_f553_4a5a_95b5_14e0ac7e7454 sec:proofValue "z5BK4yiC7Ee85EFjDYG3qSnRGrW7DrcmmMaJwEULMJknAN7ZmxTCcGZVthe71UMKreKaKbVx9rBWV3BkiWhxNpCXp" .
_:_2282589f_18af_4cb5_8e52_2c268f7608a9 rdf:type sec:Ed25519Signature2020 .
_:_2282589f_18af_4cb5_8e52_2c268f7608a9 dc:created "2021-05-29T19:23:24Z"^^xsd:dateTime .
_:_2282589f_18af_4cb5_8e52_2c268f7608a9 sec:proofPurpose sec:assertionMethod .
_:_2282589f_18af_4cb5_8e52_2c268f7608a9 sec:verificationMethod <https://pfps.example/issuer#z6MkjLrk3gKS2nnkeWcmcxiZPGskmesDpuwRBorgHxUXfxnG> .
_:_2282589f_18af_4cb5_8e52_2c268f7608a9 sec:proofValue "z3smivBv6KNyL2nLgfsJXPar7bcfLqkRo5cXGnHxU1tZLxxLLM3zN1cqDwrbXV4dncREq7uXTF58oUzrpeYFEvqKS" .
]]

When I verify that, I see "fail: expected 1 asserted proof; got 2". My
issue isn't so much what I would do if I accepted two, but where I
would write the results of verifying them (and I'd probably have to
verify them individually through the judicious addition of a for
loop).

This looks like it would be fun to play around with to see what less
conservative rules could look like. For instance, if I passed you a
tuple of a graph and a starting node, that node could have a proof,
and other nodes in the doc could have their own proofs without
confusing verification of the indended one. Lots of room to play.


> 
> 
> peter
> 
>
Received on Wednesday, 9 June 2021 11:11:11 UTC