- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 25 May 2021 15:32:36 -0400
- To: semantic-web@w3.org
PFPS wrote: > I would greatly appreciate a discussion of the possible flaws in that > document. This discussion does not appear to be happening, which I find > worrisome. I am attempting to engage in the discussion that you requested, Peter. I am going to be pedantic in my response because you've made a number of technical errors that caused you to come to the wrong conclusions. At this point in time, it is clear that you either have not read the input documents, or if you did, you missed a number of critical concepts in them that caused you to create an incorrect mental model that then led to your invalid conclusions. My response is broken into high level statements and then thorough explanation. This email is very long because I want you to know that we're taking your input seriously and spending a LOT of time to try and address your concerns. I'm thankful that you're engaging given that you are an expert in the RDF space (which is one of the types of input we need for this work to succeed). > I take the method to sign and verify RDF datasets to be as follows: Your summary of the algorithms are incorrect, are not what are in the papers or the specs, and lead to the problems you identified. > To my non-expert eye there are several significant problems here. 1/ The > signature extracted from the signed document might be different from the > signature used to sign the original document if the original document has > signatures in it. Wrong. The LDP algorithms prevent this from happening. If the signature extracted from the signed document is different in any way, the signature will fail to verify. This is expected behaviour. > 2/ The dataset extracted during verification might not be the dataset used > during signing because the original document if the original document has > signatures in it. Wrong. The LDP algorithms prevent this from happening. If the dataset changes, the signature will fail to verify. This is expected behaviour. > 3/ Adding extra information after signing might be possible without > affecting verification if the extra information looks like a signature. Wrong. The LDP algorithms prevent this from happening. Adding extra information after signing changes the hash, which will cause the signature to fail to verify. This is expected behaviour. > 4/ The dataset extracted during verification might not be the dataset used > during signing because the original document has relative IRIs. Wrong. Relative IRIs are resolved against the base IRI before they go to into the Canonicalization step. If the base IRI changes, the dataset changes and the signature will fail to verify. This is expected behaviour. > 5/ The dataset extracted during verification might not be the dataset used > during signing because the original document is in a serialization that > uses external resources to generate the dataset (like @context in JSON-LD) > and this external resource may have changed. Wrong. If an external resources changes in a way that changes the dataset, then the hash for the dataset will change causing the signature to fail to verify. This is expected behaviour. > 6/ Only the serialized dataset is signed so changing comments in > serializations that allow comments or other parts of the document that do > not encode triples or quads results can be done without affecting the > validity of the signature. This is particularly problematic for RDFa. By definition, that is not the problem that the LDS WG is solving. We are signing RDF Datasets, if you have information that lives outside of an RDF Dataset that you need to sign, we can't help you. All information that is signed is in the RDF Dataset. If there is information outside of the RDF Dataset (like comments), then it will not be signed. This is true for ANY digital signature mechanism. This only becomes a problem if an application depends on information that is not signed, at which point the application developer really should consider signing the unsigned information. This is expected behaviour. > I welcome discussion of these points and am open to being proven wrong on > them. You are wrong to varying degrees on every point above. :) I'm going to elaborate on why below... starting with your definition of the algorithms at play. > sign(document, private key, identity) Wrong. Your function signature is incorrect and does not match what's in the current LDP specification: https://w3c-ccg.github.io/ld-proofs/#proof-algorithm The inputs you provide are inadequate when it comes to protecting against replay attacks, domain retargetting attacks, and identifying key material. > let D be the RDF dataset serialized in document Correct. > let C be the canonicalized version of D Correct. > let S be triples representing a signature of C using private key Wrong. Not triples; quads. The proposed solution and algorithms are targeted at RDF Datasets, not RDF Graphs. It is possible for some subset of the solution to work on RDF Graphs, but the attack surface potentially gets larger and there are more constraints that are required to make sure the data is being processed correctly. For example, if you try to apply the solution to RDF Graphs, nested signatures in graph soup might become a headache (and this might be at the core of why you think there is a problem). The group will not be creating a solution for RDF Graphs in order to constrain the focus of the correctness and security analysis. > let signed document be document plus a serialization of S, so signed > document serializes D union (not merge) S Wrong. You skip right over a number of critical parts of the algorithm here (again, your summary is wrong because you're eliminating security critical steps in the c14n algorithm and Verify Hash Algorithm): https://w3c-ccg.github.io/ld-proofs/#create-verify-hash-algorithm For example, the RDF Dataset being signed is hashed *separately from* the RDF signature options. That is, you have D /and/ S, which are separately hashed to generate the signature, and then merged in the signed document. If you do not separate these things correctly when you go to verify, your signature will fail to verify. If you change signature options, your signature will fail to verify. If you pollute your RDF Dataset with extra quads, your signature will fail to verify. This is all expected behaviour and is important to the security of the algorithm. > return signed document Correct. :) > verify(signed document) The specification will probably end up being updated during the LDS WG to include an `options` field as that's what many implementations do today. > let D' be the RDF dataset serialized in signed document Correct. > let S be the signature in D' Wrong. S could be a single signature, a set of signatures, or a chain of signatures. > let D be D' - S Wrong. Assuming you change S to be "all proofs", then yes... but if you do that, the rest of your algorithm lacks sufficient detail to be correct. > let C be the canonicalized version of D Correct. > return whether S is a valid signature for C Wrong. You skip over many of the algorithms that work to secure the RDF Dataset. The algorithms for verifying a single signature, a set of signatures, and a chain of signatures matter here. Admittedly, the spec doesn't elaborate on these as we've really only seen single and set signatures used in the wild. Signature chains seemed like a good idea, but we haven't really seen those advanced use cases in the wild and so the LDS WG may decide that we want to avoid spending time on those things. There is also work being done on cryptographic circuits where you can support M-of-N signatures, and other types of multi-party signatures. I expect that work to be outside of the scope of the LDS WG as well. Additionally, much of the work has been using JSON-LD as the RDF Dataset serialization format, where it's easy to understand where you're entering the graph and what subject a set of proofs is attached to. For things like N-Quads, TURTLE or other graph soup syntaxes, I expect that the algorithms will need to be modified to specify the subject that the verifier is expecting the proofs to be attached to (this will come into play later in the email). > To my non-expert eye there are several significant problems here. Wrong. There are many problems with the algorithms you provided, which are not the algorithms in the specification. > 1/ The signature extracted from the signed document might be different from > the signature used to sign the original document if the original document > has signatures in it. Wrong. The LDP algorithms prevent this from happening. If the signature extracted from the signed document is different in any way, the signature will fail to verify. This is expected behaviour. The algorithms that you use to verify a set of signatures and a chain of signatures are different. A set of signatures is expressed using the `proof` property. A chain of signatures is expressed using the `proofChain` property. It is not possible to mix both `proof` and `proofChain` in a single dataset and get a deterministic ordering of signatures. The LDP specification will probably, after LDS WG review, state that you MUST NOT do so... or we might not support chained signatures at all. Also keep in mind that the algorithm needs to understand which subject the proof/proofChain properties are attached to. In JSON-LD, this is easy -- it's whatever subject the top level object describes. In TURTLE or NQuads, you have to tell the algorithm which subject is associated with the proof/proofChain properties. Keep in mind that we didn't specify this in the algorithms yet because, again, this is something that the RDF WG needs to consider as it may be possible to make this subject detection more automatic in TURTLE or NQuads. This is a small, but important digression, and is probably a gap in your knowledge about how all of this stuff is expected to work across multiple serializations. So, you're either dealing with one or more proofs associated with the `proof` property, or you're dealing with one or more proofs associated with the `proofChain` property. For a set of signatures, the general algorithm is: 1. Remove `proof` (an unordered set) from the RDF Dataset that is associated with the given subject. 2. Iterate over each proof in any order and apply the Proof Verification Algorithm: https://w3c-ccg.github.io/ld-proofs/#proof-verification-algorithm The current algorithm in the specification doesn't state this because it's not clear if the LDS WG is going to want to externalize this looping or internalize it in the algorithm above. For a chain of signatures, the general algorithm is: 1. Remove `proofChain` (an ordered list) from the RDF Dataset that is associated with the given subject. 2. Iterate over each proof in reverse order, adding the all proofs before it into the RDF Dataset and verifying against the last proof using the Proof Verification Algorithm: https://w3c-ccg.github.io/ld-proofs/#proof-verification-algorithm Again, we don't elaborate on this procedure because the vast majority of LDS today just do single signatures and so it may be that we end up not defining this in the specification. To be clear -- these algorithms are fairly straight forward (as they are just variations on verifying a single digital signature) and their correctness depends on the RDF Dataset Canonicalization algorithm and the use of well known and vetted cryptographic hashing and digital signature algorithms. In the very worst case, if the LDS WG doesn't feel comfortable supporting either set or chained signatures, then the work could be constrained to a single signature... and that is a topic of debate for the LDS WG. > 2/ The dataset extracted during verification might not be the dataset used > during signing because the original document if the original document has > signatures in it. Wrong. The LDP algorithms prevent this from happening. If the dataset changes, the signature will fail to verify. This is expected behaviour. As explained above, if the original dataset contained signatures, then those signatures are canonicalized and signed. The verification algorithm only removes the signatures from the RDF Dataset that it is instructed to verify. That is, the proofs are bound to a particular subject and it is those proofs that are removed and used during signature verification using the general algorithms listed previously in this email (and/or in the specification). Each proof is contained in its own RDF Dataset, so there is no cross-contamination between the proofs and the RDF Dataset containing the non-proof data. That is, the algorithm can surgically remove the proofs that are intended to be used during verification and leave other proofs that are included in the canonicalized data alone. Doing so addresses the recursion/embedding concern that both you and Dan raised. > 3/ Adding extra information after signing might be possible without > affecting verification if the extra information looks like a signature. Wrong. The LDP algorithms prevent this from happening. Adding extra information after signing changes the hash, which will cause the signature to fail to verify. This is expected behaviour. The Linked Data Proofs algorithms hash and sign *every Quad*. This includes the original RDF Dataset as well as all canonicalized options (i.e., signature options minus the digital signature itself). This is detailed in the specification here: https://w3c-ccg.github.io/ld-proofs/#create-verify-hash-algorithm This was a very deliberate design choice... other signature schemes, like JWTs, allow unsigned data. LDP takes a more strict approach... you cannot inject a Quad into either the original RDF Dataset OR the canonicalized options and get the same hash (modulo a bonafide hash collision). In other words, you cannot inject anything, anywhere that is covered by the signature (which is everything)... especially "extra information that looks like a signature" because that information is included in the signature. > 4/ The dataset extracted during verification might not be the dataset used > during signing because the original document has relative IRIs. Wrong. Relative IRIs are resolved against the base IRI, if the base IRI changes, the dataset changes and the signature will fail to verify. This is expected behaviour. Relative IRI resolution happens before canonicalization occurs. The JSON-LD Playground (and underlying libraries) certainly do this as a part of JSON-LD expansion: https://www.w3.org/TR/json-ld11-api/#iri-expansion RDF 1.1 Concepts states that "Relative IRIs must be resolved against a base IRI to make them absolute. Therefore, the RDF graph serialized in such syntaxes is well-defined only if a base IRI can be established [RFC3986]." We could add language to LDP that states that either 1) all inputs must be well-defined RDF Datasets, 2) all input IRIs MUST be absolute, 3) any input that contains a relative IRI and no base IRI as input is invalid (and do IRI expansion in the canonicalization spec), or some other language that makes this more clear. Again, this is something that an LDS WG should debate and come to consensus on given that the needs here are not just focused on JSON-LD and are not just focused on Verifiable Credentials. > 5/ The dataset extracted during verification might not be the dataset used > during signing because the original document is in a serialization that > uses external resources to generate the dataset (like @context in JSON-LD) > and this external resource may have changed. Wrong; this is not a problem -- it's expected behaviour. If an external resource changes in a way that changes the dataset, then the hash for the dataset will change causing the signature to fail to verify. This is expected behaviour. For example, if you pull in a JSON-LD Context (J1) and use it to generate Quads, canonicalize, and sign... and then the context changes to (J2) that changes terms or `@base` or anything else that modifies the IRIs that were signed, when the verifier converts the input to Quads, canonicalizes and checks the signature, the signature will be invalid, because the generated hash changed due to the IRIs in the RDF Dataset changing. > 6/ Only the serialized dataset is signed so changing comments in > serializations that allow comments or other parts of the document that do > not encode triples or quads results can be done without affecting the > validity of the signature. This is particularly problematic for RDFa. By definition, that is not the problem that the LDS WG is solving. We are signing RDF Datasets, if you have information that lives outside of an RDF Dataset that you need to sign, we can't help you. All information that is signed is in the RDF Dataset. If there is information outside of the RDF Dataset (like comments), then it will not be signed. This is true for ANY digital signature mechanism. This only becomes a problem if an application depends on information that is not signed, at which point the application developer really should consider signing the unsigned information. This is expected behaviour. This is not a problem for RDFa if the information you want to sign is the underlying RDF Dataset. If you want to sign a blob of HTML that contains RDFa, then you need to grab that blob of HTML and encapsulate it in the RDF Dataset and digitally sign that... or you need to use a different digital signature mechanism that just signs everything, including spaces, tabs, and other unnecessary things that if they change, will break the signature. Having the digital proof cover things outside of an RDF Dataset is almost entirely out of scope. The only thing that is in scope is if you want to embed the HTML as a literal, for example... and in that case, you can use an RDF Dataset and LDP to do that. ---------------- I hope this explains how all of the problems you raised were either 1) not problems, 2) previously known with mitigations in place 3) solved with a few sentences of documentation, or 4) not an issue and also out of scope of the LDS WG. I hope it's also clear that a large percentage of the questions you had require RDF expertise to understand rather than "security expert" expertise. While we have had input from both RDF experts and security experts, it's still not clear what sort of expertise you're looking to when analysing these algorithms. It's true that you need both sorts of people in the same room, and is thus why we are forming an LDS WG *and* have entities like the IETF Cryptography Forum Research Group, the National Institute of Standards (currently engaged), and other "security experts" listed in the Coordination section: https://w3c.github.io/lds-wg-charter/#coordination I hope these answers were helpful to you and I'm happy to answer other relevant questions you may have. What I would like from you in return are concrete suggestions on changes to the specification, issues raised, or specific parties (by name or detailed qualification) you feel should be a part of the discussion. Requesting that we bring in "security experts" is not helpful... it's like asking if we've had "RDF experts" sign-off on the algorithms. Just about every "real RDF expert" I know would claim that they're not one... because they understand how broad and deep that particular body of water is. -- manu -- Manu Sporny - https://www.linkedin.com/in/manusporny/ Founder/CEO - Digital Bazaar, Inc. blog: Veres One Decentralized Identifier Blockchain Launches https://tinyurl.com/veres-one-launches
Received on Tuesday, 25 May 2021 19:32:56 UTC