Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof) from Dave Longley on 2021-03-30 (public-credentials@w3.org from March 2021)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Tue, 30 Mar 2021 12:43:24 -0400
To: Orie Steele <orie@transmute.industries>, David Waite <dwaite@pingidentity.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <accc44d2-462d-f7d6-72df-a92ea888dc2e@digitalbazaar.com>
On 3/30/21 11:43 AM, Orie Steele wrote:
> Overall I agree with a lot of David's comments.
> 
> In particular, I have seen the following issues with LD Proofs:
> 
> 1. silently dropping terms, instead of throwing an error. (allows an
> attacker to inject certain terms are dropped).
> 2. poor implementations loading contexts over the network (DNS
> poisoning, latency attacks)
> 3. @vocab and other language "features" making it hard to tell what you
> are actually signing
> 4. documentation / controller ship issues with vocab (same problem as
> JOSE, things need to be registered and documented somewhere)
> 
> 3) is easy to fix, @vocab should result in an error being thrown in any
> security context. https://github.com/w3c/vc-data-model/issues/753
> 
> Note that 3 applies to all VC formats, regardless of the proof /
> signature format.
> 
> 2) is very easy to fix, just pass a document loader that never makes
> network requests to any software you want to never make network requests
> and make sure the software still passes all its tests... 
> 
> 1.) is the most critical imo, different implementations handle this
> issue differently.
> 
> IMO the correct behavior is to throw when ANY undefined term is
> detected, and halt immediately. Implementations that silently dropped
> properties have created a massive security issue for us on this front...
> and its related to canonicalization, essentially if your
> canonicalization alg silently drops any information its a security
> vulnerability... the default behavior of any such algorithm should be to
> throw.

+1, I agree and think we can address the issue by being strict in this
manner. If you pass in some JSON-LD (or other LD format) to a
sign/verify API and any terms are not defined, you'll get an error. This
creates the security binding/boundaries that we want whilst still
allowing us to enjoy benefits we get from canonicalization.

> 
> There is a kind of pseudo canonicalization that every digital signature
> system relies on... and it's called a hash function. There are a number
> of reasons that hash functions are used with digital signatures, and a
> number of attacks that have results from poor choice of hash functions:
> 
> - https://blog.torproject.org/md5-certificate-collision-attack-and-what-it-means-tor
> -
> https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
>  
> Yes, there are problems with complexity in the data that is hashed
> before a signature is applied, but none as deadly as picking a poor hash
> function.
> 
> in JOSE, what is signed is "base64(json(header)).base64(json(payload))"
> 
> in LD Proofs, what is signed is
> "sha256(canonicalize(header))sha256(canonicalize(document)) "
> 
> See https://docs.joinmastodon.org/spec/security for another explanation...
> 
> In both cases, the signature algorithm likely hashes this message before
> signing with EdDSA or ECDSA, etc...
> 
> A couple observations....
> 
> base64 in jose is a form of canonicalizing... because header and payload
> objects might have different orderings, but base64url encoding makes
> those orderings opaque... by inflating them 33%.
> 
> canonicalize in the LD Proof could be JCS, or simple sorting of JSON
> Keys... or RDF Data Set Normalization... each would yield a different
> signature... 
> 
> mechanically, the fact that JCS exists hints at the problem with JOSE...
> if you want to sign things, you want stable hashes, and therefore
> need SOME form of canonicalization for complex data structures.
> 
> JOSE works very well for small id tokens, like the ones that are used in
> OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
> sets without another tool.
> 
> "Detached JWS with Unencoded Payload":
> 
> https://tools.ietf.org/html/rfc7515#appendix-F
> https://tools.ietf.org/html/rfc7797
> 
> This is how the JWS for LD Proofs are generated, and the "Unencoded
> payload part" is the result of the canonicalization algorithm.... 
> 
> What would happen if we just decided to use "Unencoded Payload" without
> canonicalization?... maybe we just use JSON.stringify?
> 
> it still works!... sorta... now I can generate a new message and
> signature for every ordering of data in the payload... for a really
> complex and very large payload, that's going to be a LOT of deeply equal
> objects... that each yield a different signature... this can lead to
> storing a massive amount of redundant but indistinguishable data...
> which can lead to resource exhaustion attacks.
> 
> In fact, the sidetree protocol uses JCS for this exact
> reason... https://identity.foundation/sidetree/spec/#default-parameters
> 
> So in summary, in any JOSE library you can replace JSON with JCS and get
> better signatures, and developers will thank you because they won't be
> tracking down bugs related to duplicate content... and canonicalization
> can also lead to security issues if not handled properly... regardless
> of how you canonicalize things.
> 
> Regards,
> 
> OS
> 
> 
> 
> On Tue, Mar 30, 2021 at 1:47 AM David Waite <dwaite@pingidentity.com
> <mailto:dwaite@pingidentity.com>> wrote:
> 
>     On 3/27/21 11:12 AM, David Chadwick wrote:
>     > This is a major benefit of using JWS/JWT, as canonicalisation has
>     been 
>     > fraught with difficulties (as anybody who has worked with XML
>     signatures 
>     > will know, and discussions in the IETF PKIX group have highlighted). 
> 
>     On Mar 27, 2021, 9:26 AM, Manu Sporny wrote:
> 
>         Anyone who believes that RDF Dataset Canonicalization is the
>         same problem as
>         XML Canonicalization does not understand the problem space.
>         These are two very
>         different problem spaces with very different solutions.
> 
> 
>     There have been interoperability issues with XML canonicalization,
>     but the impact of those _pale_ in comparison to the security issues.
>     JOSE was adopted as a next step for signed data for many use cases
>     both for interoperability and for security reasons.
> 
>     It is crucially important to remember that for current LD proofs:
>     - the canonicalization algorithm determines which details are
>     critical and which are ignorable
>     - the proof algorithms specify an canonicalization algorithm, there
>     is no guarantee that URDNA2015 will always be the one chosen
>     - JSON-LD is not just for serialization of RDF, but for the
>     interpretation of JSON as RDF.
> 
>     You need security considerations for processing a JSON-encoded
>     document following a successful LD Proof. This is because you did
>     not prove the JSON was integrity-protected, but that the RDF
>     interpretation of the JSON by some canonicalization algorithm
>     (itself an interpretation based on some JSON-LD context) was protected.
> 
>     And these were the problems with XML Signatures and XML
>     Canonicalization. Developers want clean abstractions, and _need_
>     clean abstractions for security boundaries. Canonicalization and
>     document transformations mean a developer must process the data in
>     the same way as the security layer, lest you have potential security
>     vulnerabilities.
> 
>     I imagine that eventually there will eventually be a desire to
>     separately sign different subsets of the RDF dataset for large
>     datasets (like graph databases), or to support the proof being
>     external to the dataset rather than being represented as part of the
>     dataset, and so on. These complexities in XML canonicalization and
>     signatures introduced security vulnerabilities. Even with
>     correct signature library implementations, the application code
>     interpreting the data did not necessarily rise to the same level of
>     sophistication.
> 
>     JOSE for this reason chose a 'sealed envelope' approach to signing
>     and encryption, where the data is opaque to the security layer and
>     vice-versa. The abstraction isn't in some canonical interpretation
>     of the application data, but that the data is byte-for-byte
>     identical to what was signed.
> 
>     This is why JSON Clear Signatures had so little interest from the
>     JOSE community at large. The problem wasn't that we couldn't imagine
>     a canonicalization of JSON, it was that so many had been burned by
>     all the edge cases that grew out of that flexibility in the past.
>     For those who cared about saving 25%+ of their data cost by wrapping
>     (potentially) binary data in a text-safe format, CBOR/COSE became
>     available.
> 
>     -DW
> 
>     P.S. this is completely ignoring the issues of DNS-style 'poisoning'
>     if you accept data from non-authoritative sources based purely on it
>     being signed, then treat that data as part of a cache or as an
>     update to your own persistent data set. This was an uncommon problem
>     in XML since most XML-based formats did not support embedding
>     external resources.
> 
>     /CONFIDENTIALITY NOTICE: This email may contain confidential and
>     privileged material for the sole use of the intended recipient(s).
>     Any review, use, distribution or disclosure by others is strictly
>     prohibited.  If you have received this communication in error,
>     please notify the sender immediately by e-mail and delete the
>     message and any file attachments from your computer. Thank you./
> 
> 
> 
> -- 
> *ORIE STEELE*
> Chief Technical Officer
> www.transmute.industries
> 
> <https://www.transmute.industries>


-- 
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Tuesday, 30 March 2021 16:43:44 UTC