- From: Dave Longley <dlongley@digitalbazaar.com>
- Date: Tue, 30 Mar 2021 12:43:24 -0400
- To: Orie Steele <orie@transmute.industries>, David Waite <dwaite@pingidentity.com>
- Cc: Manu Sporny <msporny@digitalbazaar.com>, Credentials Community Group <public-credentials@w3.org>
On 3/30/21 11:43 AM, Orie Steele wrote: > Overall I agree with a lot of David's comments. > > In particular, I have seen the following issues with LD Proofs: > > 1. silently dropping terms, instead of throwing an error. (allows an > attacker to inject certain terms are dropped). > 2. poor implementations loading contexts over the network (DNS > poisoning, latency attacks) > 3. @vocab and other language "features" making it hard to tell what you > are actually signing > 4. documentation / controller ship issues with vocab (same problem as > JOSE, things need to be registered and documented somewhere) > > 3) is easy to fix, @vocab should result in an error being thrown in any > security context. https://github.com/w3c/vc-data-model/issues/753 > > Note that 3 applies to all VC formats, regardless of the proof / > signature format. > > 2) is very easy to fix, just pass a document loader that never makes > network requests to any software you want to never make network requests > and make sure the software still passes all its tests... > > 1.) is the most critical imo, different implementations handle this > issue differently. > > IMO the correct behavior is to throw when ANY undefined term is > detected, and halt immediately. Implementations that silently dropped > properties have created a massive security issue for us on this front... > and its related to canonicalization, essentially if your > canonicalization alg silently drops any information its a security > vulnerability... the default behavior of any such algorithm should be to > throw. +1, I agree and think we can address the issue by being strict in this manner. If you pass in some JSON-LD (or other LD format) to a sign/verify API and any terms are not defined, you'll get an error. This creates the security binding/boundaries that we want whilst still allowing us to enjoy benefits we get from canonicalization. > > There is a kind of pseudo canonicalization that every digital signature > system relies on... and it's called a hash function. There are a number > of reasons that hash functions are used with digital signatures, and a > number of attacks that have results from poor choice of hash functions: > > - https://blog.torproject.org/md5-certificate-collision-attack-and-what-it-means-tor > - > https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/ > > Yes, there are problems with complexity in the data that is hashed > before a signature is applied, but none as deadly as picking a poor hash > function. > > in JOSE, what is signed is "base64(json(header)).base64(json(payload))" > > in LD Proofs, what is signed is > "sha256(canonicalize(header))sha256(canonicalize(document)) " > > See https://docs.joinmastodon.org/spec/security for another explanation... > > In both cases, the signature algorithm likely hashes this message before > signing with EdDSA or ECDSA, etc... > > A couple observations.... > > base64 in jose is a form of canonicalizing... because header and payload > objects might have different orderings, but base64url encoding makes > those orderings opaque... by inflating them 33%. > > canonicalize in the LD Proof could be JCS, or simple sorting of JSON > Keys... or RDF Data Set Normalization... each would yield a different > signature... > > mechanically, the fact that JCS exists hints at the problem with JOSE... > if you want to sign things, you want stable hashes, and therefore > need SOME form of canonicalization for complex data structures. > > JOSE works very well for small id tokens, like the ones that are used in > OIDC / OAuth... JOSE totally doesn't scale to signatures over large data > sets without another tool. > > "Detached JWS with Unencoded Payload": > > https://tools.ietf.org/html/rfc7515#appendix-F > https://tools.ietf.org/html/rfc7797 > > This is how the JWS for LD Proofs are generated, and the "Unencoded > payload part" is the result of the canonicalization algorithm.... > > What would happen if we just decided to use "Unencoded Payload" without > canonicalization?... maybe we just use JSON.stringify? > > it still works!... sorta... now I can generate a new message and > signature for every ordering of data in the payload... for a really > complex and very large payload, that's going to be a LOT of deeply equal > objects... that each yield a different signature... this can lead to > storing a massive amount of redundant but indistinguishable data... > which can lead to resource exhaustion attacks. > > In fact, the sidetree protocol uses JCS for this exact > reason... https://identity.foundation/sidetree/spec/#default-parameters > > So in summary, in any JOSE library you can replace JSON with JCS and get > better signatures, and developers will thank you because they won't be > tracking down bugs related to duplicate content... and canonicalization > can also lead to security issues if not handled properly... regardless > of how you canonicalize things. > > Regards, > > OS > > > > On Tue, Mar 30, 2021 at 1:47 AM David Waite <dwaite@pingidentity.com > <mailto:dwaite@pingidentity.com>> wrote: > > On 3/27/21 11:12 AM, David Chadwick wrote: > > This is a major benefit of using JWS/JWT, as canonicalisation has > been > > fraught with difficulties (as anybody who has worked with XML > signatures > > will know, and discussions in the IETF PKIX group have highlighted). > > On Mar 27, 2021, 9:26 AM, Manu Sporny wrote: > > Anyone who believes that RDF Dataset Canonicalization is the > same problem as > XML Canonicalization does not understand the problem space. > These are two very > different problem spaces with very different solutions. > > > There have been interoperability issues with XML canonicalization, > but the impact of those _pale_ in comparison to the security issues. > JOSE was adopted as a next step for signed data for many use cases > both for interoperability and for security reasons. > > It is crucially important to remember that for current LD proofs: > - the canonicalization algorithm determines which details are > critical and which are ignorable > - the proof algorithms specify an canonicalization algorithm, there > is no guarantee that URDNA2015 will always be the one chosen > - JSON-LD is not just for serialization of RDF, but for the > interpretation of JSON as RDF. > > You need security considerations for processing a JSON-encoded > document following a successful LD Proof. This is because you did > not prove the JSON was integrity-protected, but that the RDF > interpretation of the JSON by some canonicalization algorithm > (itself an interpretation based on some JSON-LD context) was protected. > > And these were the problems with XML Signatures and XML > Canonicalization. Developers want clean abstractions, and _need_ > clean abstractions for security boundaries. Canonicalization and > document transformations mean a developer must process the data in > the same way as the security layer, lest you have potential security > vulnerabilities. > > I imagine that eventually there will eventually be a desire to > separately sign different subsets of the RDF dataset for large > datasets (like graph databases), or to support the proof being > external to the dataset rather than being represented as part of the > dataset, and so on. These complexities in XML canonicalization and > signatures introduced security vulnerabilities. Even with > correct signature library implementations, the application code > interpreting the data did not necessarily rise to the same level of > sophistication. > > JOSE for this reason chose a 'sealed envelope' approach to signing > and encryption, where the data is opaque to the security layer and > vice-versa. The abstraction isn't in some canonical interpretation > of the application data, but that the data is byte-for-byte > identical to what was signed. > > This is why JSON Clear Signatures had so little interest from the > JOSE community at large. The problem wasn't that we couldn't imagine > a canonicalization of JSON, it was that so many had been burned by > all the edge cases that grew out of that flexibility in the past. > For those who cared about saving 25%+ of their data cost by wrapping > (potentially) binary data in a text-safe format, CBOR/COSE became > available. > > -DW > > P.S. this is completely ignoring the issues of DNS-style 'poisoning' > if you accept data from non-authoritative sources based purely on it > being signed, then treat that data as part of a cache or as an > update to your own persistent data set. This was an uncommon problem > in XML since most XML-based formats did not support embedding > external resources. > > /CONFIDENTIALITY NOTICE: This email may contain confidential and > privileged material for the sole use of the intended recipient(s). > Any review, use, distribution or disclosure by others is strictly > prohibited. If you have received this communication in error, > please notify the sender immediately by e-mail and delete the > message and any file attachments from your computer. Thank you./ > > > > -- > *ORIE STEELE* > Chief Technical Officer > www.transmute.industries > > <https://www.transmute.industries> -- Dave Longley CTO Digital Bazaar, Inc.
Received on Tuesday, 30 March 2021 16:43:44 UTC