- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Wed, 26 May 2021 10:08:36 -0400
- To: semantic-web@w3.org
[The referenced email appears to have a large amount of duplicated text. I have only responded to the part before the duplication starts.] Here are several attacks that I believe can be carried out against the algorithms in https://w3c-ccg.github.io/ld-proofs/#algorithms. Attack 1 is probably difficult to do and doesn't get much, but it does get consumers to believe that a producer signed something the producer didn't. The producer creates a file containing relative IRIs that serializes G if the base IRI is the retrieval IRI in the context where the consumer's verification algorithm will be run. The producer then signs this file, with the base IRI this retrieval IRI. The consumer's verification function will succeed on the signed file. But when the consumer actually deserializes the signed file the retrieval IRI may be different from the retrieval IRI that was used in the verification algorithm. Attack 2 is less difficult but requires something like the JSON-LD @context mechanism. A producer signs a document that has a remote context that is under the control of a third party. The consumer verifies the signed document, which is successful because the first time the consumer asks for the remote context the same information is sent, and sent as expiring immediately. The third party then sends different remote context the next time the consumer asks for it so that when the consumer deserializes the signed document the consumer sees an RDF dataset that is not what the producer signed. Attack 3 depends on the presence of multiple proof nodes. Suppose the original graph already contains a proof node. The producer signs this graph. The consumer, as part of the verification process removes all proof nodes and tries to verify the signed document minus the proof nodes. The verification of the signature fails because the proof node in the original graph is not present. Attack 4 also depends on the presence of multiple proof nodes and exploits a flaw in how the verify hash algorithm is specified. Suppose an opponent creates a fake signing of an original graph. The opponent then signs the "signed" graph. The consumer then takes the proof nodes out of the graph that the opponent has signed. The create verify hash algorithm is given two proof nodes but only expects one and only verifies the opponent's signature. The consumer then deserializes the graph it received and believes that the fake signature has been verified. Attack 5 is similar to attack 4 except that the false information is added afterwards. Suppose a producer signs a linked data document. Then an opponent adds an extra signature, either fake or real. The consumer then takes the proof nodes out of the graph that the opponent has modified. The create verify hash algorithm is given two proof nodes but only expects one and only verifies the producer's signature. The consumer then deserializes the graph it received and believes that the signature the opponent inserted has been verified. On Tue, 2021-05-25 at 15:32 -0400, Manu Sporny wrote: > PFPS wrote: > > I would greatly appreciate a discussion of the possible flaws in that > > document. This discussion does not appear to be happening, which I find > > worrisome. > > I am attempting to engage in the discussion that you requested, Peter. I am > going to be pedantic in my response because you've made a number of technical > errors that caused you to come to the wrong conclusions. > > At this point in time, it is clear that you either have not read the input > documents, or if you did, you missed a number of critical concepts in them > that caused you to create an incorrect mental model that then led to your > invalid conclusions. > > My response is broken into high level statements and then thorough > explanation. This email is very long because I want you to know that we're > taking your input seriously and spending a LOT of time to try and > address your concerns. > > I'm thankful that you're engaging given that you are an expert in the RDF > space (which is one of the types of input we need for this work to succeed). > > > I take the method to sign and verify RDF datasets to be as follows: > > Your summary of the algorithms are incorrect, are not what are in the > papers > or the specs, and lead to the problems you identified. See below. > > To my non-expert eye there are several significant problems here. > > 1/ The > > signature extracted from the signed document might be different > > from the > > signature used to sign the original document if the original > > document has > > signatures in it. > > Wrong. > > The LDP algorithms prevent this from happening. > > If the signature extracted from the signed document is different in any way, > the signature will fail to verify. > > This is expected behaviour. See attack 3. Attack 4 is also relevant here.. > > 2/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document if the original > > document has > > signatures in it. > > Wrong. > > The LDP algorithms prevent this from happening. > > If the dataset changes, the signature will fail to verify. > > This is expected behaviour. See attack 3. > > 3/ Adding extra information after signing might be possible without > > affecting verification if the extra information looks like a > > signature. > > Wrong. > > The LDP algorithms prevent this from happening. > > Adding extra information after signing changes the hash, which will > cause the > signature to fail to verify. > > This is expected behaviour. See Attack 5. > > 4/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document has relative IRIs. > > Wrong. You seem to be saying here that relative IRIs don't cause verification to fail. > Relative IRIs are resolved against the base IRI before they go to > into the > Canonicalization step. If the base IRI changes, the dataset changes > and the > signature will fail to verify. Here you appear to be saying that relative IRIs can cause verification to fail. > This is expected behaviour. See Attack 1, which fiddles with relative IRIs in a way that verification suceeds but the consumer believes a different RDF dataset has been verified. > > 5/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document is in a serialization > > that > > uses external resources to generate the dataset (like @context in > > JSON-LD) > > and this external resource may have changed. > > Wrong. As above. > If an external resources changes in a way that changes the dataset, > then the > hash for the dataset will change causing the signature to fail to > verify. As above. > This is expected behaviour. See Attack 2, which fiddles with remote contexts in a way that verification suceeds but the consumer believes a different RDF dataset has been verified. > > 6/ Only the serialized dataset is signed so changing comments in > > serializations that allow comments or other parts of the document > > that do > > not encode triples or quads results can be done without affecting > > the > > validity of the signature. This is particularly problematic for > > RDFa. > > By definition, that is not the problem that the LDS WG is solving. We > are > signing RDF Datasets, if you have information that lives outside of > an RDF > Dataset that you need to sign, we can't help you. > > All information that is signed is in the RDF Dataset. If there is > information > outside of the RDF Dataset (like comments), then it will not be > signed. This > is true for ANY digital signature mechanism. This only becomes a > problem if an > application depends on information that is not signed, at which point > the > application developer really should consider signing the unsigned > information. > > This is expected behaviour. The may be the *defined* behaviour, but there may be consumers who believe that non-coding parts of the document have been signed. > > I welcome discussion of these points and am open to being proven > > wrong on > > them. > > You are wrong to varying degrees on every point above. :) I disagree. I believe I have outline attacks that exhibit each of these problems. As all I have to go on is the high-level description in https://w3c-ccg.github.io/ld-proofs/#algorithms some of these attacks may not be exhibited in some implementations. I am awaiting a reference implementation of the algorithms in https://w3c-ccg.github.io/ld-proofs/#algorithms. > I'm going to elaborate on why below... starting with your definition of the > algorithms at play. > > > sign(document, private key, identity) > > Wrong. > > Your function signature is incorrect and does not match what's in the > current > LDP specification: > > https://w3c-ccg.github.io/ld-proofs/#proof-algorithm > > The inputs you provide are inadequate when it comes to protecting > against > replay attacks, domain retargetting attacks, and identifying key > material. Yes, I am missing the date. The domain is optional. The identity contains or points to a public/private key pair. > > let D be the RDF dataset serialized in document > > Correct. > > > let C be the canonicalized version of D > > Correct. > > > let S be triples representing a signature of C using private key > > Wrong. > > Not triples; quads. The proposed solution and algorithms are targeted > at RDF > Datasets, not RDF Graphs. It is possible for some subset of the > solution to > work on RDF Graphs, but the attack surface potentially gets larger > and there > are more constraints that are required to make sure the data is being > processed correctly. The signature information is added to the default graph, as far as I can tell, so triples are adequate. > For example, if you try to apply the solution to RDF Graphs, nested signatures > in graph soup might become a headache (and this might be at the core of why > you think there is a problem). > > The group will not be creating a solution for RDF Graphs in order to constrain > the focus of the correctness and security analysis. What does it matter if the serialized information is an RDF dataset or just an RDF graph? An RDF graph is, in essence, an RDF dataset and the added information is added to the default graph. So any problem with just RDF graphs is also present if RDF datasets are allowed. > > let signed document be document plus a serialization of S, so signed > > document serializes D union (not merge) S > > Wrong. > > You skip right over a number of critical parts of the algorithm here > (again, > your summary is wrong because you're eliminating security critical > steps in > the c14n algorithm and Verify Hash Algorithm): I do skip over a part of the algorithm, but the result of steps 2 through 4 of https://w3c-ccg.github.io/ld-proofs/#proof-algorithm is a proof value (or signature for just signing) which is serialized and added to the document in step 5 so I think my summary is adequate. > https://w3c-ccg.github.io/ld-proofs/#create-verify-hash-algorithm > > For example, the RDF Dataset being signed is hashed *separately from* > the RDF > signature options. That is, you have D /and/ S, which are separately > hashed to > generate the signature, and then merged in the signed document. If > you do not > separate these things correctly when you go to verify, your signature > will > fail to verify. If you change signature options, your signature will > fail to > verify. If you pollute your RDF Dataset with extra quads, your > signature will > fail to verify. This is all expected behaviour and is important to > the > security of the algorithm. Agreed, but my summary just wraps all that up into a single action. > > return signed document > > Correct. :) > > > verify(signed document) > > The specification will probably end up being updated during the LDS WG to > include an `options` field as that's what many implementations do today. > > > let D' be the RDF dataset serialized in signed document > > Correct. > > > let S be the signature in D' > > Wrong. > > S could be a single signature, a set of signatures, or a chain of > signatures. The extraction in step 3 extracts all the proof nodes but is then fed into https://w3c-ccg.github.io/ld-proofs/#create-verify-hash-algorithm which appears to accept a single proof. There are other places where proof value also appears to be a single proof. In any case, the proof nodes are all removed. > > let D be D' - S > > Wrong. > > Assuming you change S to be "all proofs", then yes... but if you do > that, the > rest of your algorithm lacks sufficient detail to be correct. OK, S is all proof nodes in D'. > > let C be the canonicalized version of D > > Correct. > > > return whether S is a valid signature for C > > Wrong. You skip over many of the algorithms that work to secure the > RDF Dataset. I do skip over the details of determining whether S is valid or not, but I don't think my summary is incorrect. I do believe that my comments above on multiplicity of signatures are correct. > The algorithms for verifying a single signature, a set of signatures, and a > chain of signatures matter here. Admittedly, the spec doesn't elaborate on > these as we've really only seen single and set signatures used in the wild. > Signature chains seemed like a good idea, but we haven't really seen those > advanced use cases in the wild and so the LDS WG may decide that we want to > avoid spending time on those things. There is also work being done on > cryptographic circuits where you can support M-of-N signatures, and other > types of multi-party signatures. I expect that work to be outside of the > scope of the LDS WG as well. > Additionally, much of the work has been using JSON-LD as the RDF Dataset > serialization format, where it's easy to understand where you're entering the > graph and what subject a set of proofs is attached to. For things like > N-Quads, TURTLE or other graph soup syntaxes, I expect that the algorithms > will need to be modified to specify the subject that the verifier is expecting > the proofs to be attached to (this will come into play later in the email). Is this so? How does one determine whether one signature is included in another signing in JSON-LD? [Start of duplicated content.] > > To my non-expert eye there are several significant problems here. > > Wrong. There are many problems with the algorithms you provided, > which are not > the algorithms in the specification. > > > 1/ The signature extracted from the signed document might be > > different from > > the signature used to sign the original document if the original > > document > > has signatures in it. > > Wrong. > > The LDP algorithms prevent this from happening. > > If the signature extracted from the signed document is different in > any way, > the signature will fail to verify. > > This is expected behaviour. > > The algorithms that you use to verify a set of signatures and a chain > of > signatures are different. > > A set of signatures is expressed using the `proof` property. > > A chain of signatures is expressed using the `proofChain` property. > > It is not possible to mix both `proof` and `proofChain` in a single > dataset > and get a deterministic ordering of signatures. The LDP specification > will > probably, after LDS WG review, state that you MUST NOT do so... or we > might > not support chained signatures at all. > > Also keep in mind that the algorithm needs to understand which > subject the > proof/proofChain properties are attached to. In JSON-LD, this is easy > -- it's > whatever subject the top level object describes. In TURTLE or NQuads, > you have > to tell the algorithm which subject is associated with the > proof/proofChain > properties. Keep in mind that we didn't specify this in the > algorithms yet > because, again, this is something that the RDF WG needs to consider > as it may > be possible to make this subject detection more automatic in TURTLE > or NQuads. > This is a small, but important digression, and is probably a gap in > your > knowledge about how all of this stuff is expected to work across > multiple > serializations. > > So, you're either dealing with one or more proofs associated with the > `proof` > property, or you're dealing with one or more proofs associated with > the > `proofChain` property. > > For a set of signatures, the general algorithm is: > > 1. Remove `proof` (an unordered set) from the RDF Dataset > that is associated with the given subject. > 2. Iterate over each proof in any order and apply the > Proof Verification Algorithm: > https://w3c-ccg.github.io/ld-proofs/#proof-verification-algorithm > > The current algorithm in the specification doesn't state this because > it's not > clear if the LDS WG is going to want to externalize this looping or > internalize it in the algorithm above. > > For a chain of signatures, the general algorithm is: > > 1. Remove `proofChain` (an ordered list) from the RDF > Dataset that is associated with the given subject. > 2. Iterate over each proof in reverse order, adding > the all proofs before it into the RDF Dataset and > verifying against the last proof using the Proof Verification > Algorithm: > https://w3c-ccg.github.io/ld-proofs/#proof-verification-algorithm > > Again, we don't elaborate on this procedure because the vast majority > of LDS > today just do single signatures and so it may be that we end up not > defining > this in the specification. > > To be clear -- these algorithms are fairly straight forward (as they > are just > variations on verifying a single digital signature) and their > correctness > depends on the RDF Dataset Canonicalization algorithm and the use of > well > known and vetted cryptographic hashing and digital signature > algorithms. In > the very worst case, if the LDS WG doesn't feel comfortable > supporting either > set or chained signatures, then the work could be constrained to a > single > signature... and that is a topic of debate for the LDS WG. > > > 2/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document if the original > > document has > > signatures in it. > > Wrong. > > The LDP algorithms prevent this from happening. > > If the dataset changes, the signature will fail to verify. > > This is expected behaviour. > > As explained above, if the original dataset contained signatures, > then those > signatures are canonicalized and signed. > > The verification algorithm only removes the signatures from the RDF > Dataset > that it is instructed to verify. That is, the proofs are bound to a > particular > subject and it is those proofs that are removed and used during > signature > verification using the general algorithms listed previously in this > email > (and/or in the specification). > > Each proof is contained in its own RDF Dataset, so there is no > cross-contamination between the proofs and the RDF Dataset containing > the > non-proof data. That is, the algorithm can surgically remove the > proofs that > are intended to be used during verification and leave other proofs > that are > included in the canonicalized data alone. Doing so addresses the > recursion/embedding concern that both you and Dan raised. > > > 3/ Adding extra information after signing might be possible without > > affecting verification if the extra information looks like a > > signature. > > Wrong. > > The LDP algorithms prevent this from happening. > > Adding extra information after signing changes the hash, which will > cause the > signature to fail to verify. > > This is expected behaviour. > > The Linked Data Proofs algorithms hash and sign *every Quad*. This > includes > the original RDF Dataset as well as all canonicalized options (i.e., > signature > options minus the digital signature itself). This is detailed in the > specification here: > > https://w3c-ccg.github.io/ld-proofs/#create-verify-hash-algorithm > > This was a very deliberate design choice... other signature schemes, > like > JWTs, allow unsigned data. LDP takes a more strict approach... you > cannot > inject a Quad into either the original RDF Dataset OR the > canonicalized > options and get the same hash (modulo a bonafide hash collision). In > other > words, you cannot inject anything, anywhere that is covered by the > signature > (which is everything)... especially "extra information that looks > like a > signature" because that information is included in the signature. > > > 4/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document has relative IRIs. > > Wrong. > > Relative IRIs are resolved against the base IRI, if the base IRI > changes, the > dataset changes and the signature will fail to verify. > > This is expected behaviour. > > Relative IRI resolution happens before canonicalization occurs. The > JSON-LD > Playground (and underlying libraries) certainly do this as a part of > JSON-LD > expansion: > > https://www.w3.org/TR/json-ld11-api/#iri-expansion > > RDF 1.1 Concepts states that "Relative IRIs must be resolved against > a base > IRI to make them absolute. Therefore, the RDF graph serialized in > such > syntaxes is well-defined only if a base IRI can be established > [RFC3986]." > > We could add language to LDP that states that either 1) all inputs > must be > well-defined RDF Datasets, 2) all input IRIs MUST be absolute, 3) > any input > that contains a relative IRI and no base IRI as input is invalid (and > do IRI > expansion in the canonicalization spec), or some other language that > makes > this more clear. > > Again, this is something that an LDS WG should debate and come to > consensus on > given that the needs here are not just focused on JSON-LD and are not > just > focused on Verifiable Credentials. > > > 5/ The dataset extracted during verification might not be the > > dataset used > > during signing because the original document is in a serialization > > that > > uses external resources to generate the dataset (like @context in > > JSON-LD) > > and this external resource may have changed. > > Wrong; this is not a problem -- it's expected behaviour. > > If an external resource changes in a way that changes the dataset, > then the > hash for the dataset will change causing the signature to fail to > verify. > > This is expected behaviour. > > For example, if you pull in a JSON-LD Context (J1) and use it to > generate Quads, > canonicalize, and sign... and then the context changes to (J2) that > changes > terms or `@base` or anything else that modifies the IRIs that were > signed, > when the verifier converts the input to Quads, canonicalizes and > checks the > signature, the signature will be invalid, because the generated hash > changed > due to the IRIs in the RDF Dataset changing. > > > 6/ Only the serialized dataset is signed so changing comments in > > serializations that allow comments or other parts of the document > > that do > > not encode triples or quads results can be done without affecting > > the > > validity of the signature. This is particularly problematic for > > RDFa. > > By definition, that is not the problem that the LDS WG is solving. We > are > signing RDF Datasets, if you have information that lives outside of > an RDF > Dataset that you need to sign, we can't help you. > > All information that is signed is in the RDF Dataset. If there is > information > outside of the RDF Dataset (like comments), then it will not be > signed. This > is true for ANY digital signature mechanism. This only becomes a > problem if an > application depends on information that is not signed, at which point > the > application developer really should consider signing the unsigned > information. > > This is expected behaviour. > > This is not a problem for RDFa if the information you want to sign is > the > underlying RDF Dataset. If you want to sign a blob of HTML that > contains RDFa, > then you need to grab that blob of HTML and encapsulate it in the RDF > Dataset > and digitally sign that... or you need to use a different digital > signature > mechanism that just signs everything, including spaces, tabs, and > other > unnecessary things that if they change, will break the signature. > > Having the digital proof cover things outside of an RDF Dataset is > almost > entirely out of scope. The only thing that is in scope is if you want > to embed > the HTML as a literal, for example... and in that case, you can use > an RDF > Dataset and LDP to do that. > > ---------------- > > I hope this explains how all of the problems you raised were either > 1) not > problems, 2) previously known with mitigations in place 3) solved > with a few > sentences of documentation, or 4) not an issue and also out of scope > of the > LDS WG. > > I hope it's also clear that a large percentage of the questions you > had > require RDF expertise to understand rather than "security expert" > expertise. > While we have had input from both RDF experts and security experts, > it's still > not clear what sort of expertise you're looking to when analysing > these > algorithms. It's true that you need both sorts of people in the same > room, and > is thus why we are forming an LDS WG *and* have entities like the > IETF > Cryptography Forum Research Group, the National Institute of > Standards > (currently engaged), and other "security experts" listed in the > Coordination > section: > > https://w3c.github.io/lds-wg-charter/#coordination > > I hope these answers were helpful to you and I'm happy to answer > other > relevant questions you may have. > > What I would like from you in return are concrete suggestions on > changes to > the specification, issues raised, or specific parties (by name or > detailed > qualification) you feel should be a part of the discussion. > Requesting that we > bring in "security experts" is not helpful... it's like asking if > we've had > "RDF experts" sign-off on the algorithms. Just about every "real RDF > expert" I > know would claim that they're not one... because they understand how > broad and > deep that particular body of water is. > > -- manu >
Received on Wednesday, 26 May 2021 14:09:54 UTC