Re: VC-JWT perma-thread from Nikos Fotiou on 2021-03-31 (public-credentials@w3.org from March 2021)

From: Nikos Fotiou <fotiou@aueb.gr>
Date: Wed, 31 Mar 2021 18:43:32 +0300
To: Orie Steele <orie@transmute.industries>
Cc: Credentials Community Group <public-credentials@w3.org>
Message-ID: <49e3441b758b9335c7e2a7e9cbd1e2c5@aueb.gr>
"What would happen if we just decided to use "Unencoded Payload" without
canonicalization?... maybe we just use JSON.stringify?
it still works!... sorta... now I can generate a new message and
signature for every ordering of data in the payload... for a really
complex and very large payload, that's going to be a LOT of deeply equal
objects... that each yield a different signature... this can lead to
storing a massive amount of redundant but indistinguishable data...
which can lead to resource exhaustion attacks." 

What baffles me is why this is a requirement in the first place? Why do
you want to move things around in a VC or a DID document and still have
valid signatures? IMHO if you have a VC (or a DID document) covered by a
digital signature then you must not be able to change anything. 

As a side note, I am not convinced by any argument related to storage
space savings, especially when we are talking about so small objects. It
is 2021 and storage is super cheap, even for small, constrained devices.
On the other hand, RAM and CPU are not. 

Best, 

Nikos

Στις 2021-03-30 18:43, Orie Steele έγραψε:

> Overall I agree with a lot of David's comments.
> 
> In particular, I have seen the following issues with LD Proofs:
> 
> 1. silently dropping terms, instead of throwing an error. (allows an attacker to inject certain terms are dropped).
> 2. poor implementations loading contexts over the network (DNS poisoning, latency attacks)
> 3. @vocab and other language "features" making it hard to tell what you are actually signing
> 4. documentation / controller ship issues with vocab (same problem as JOSE, things need to be registered and documented somewhere)
> 
> 3) is easy to fix, @vocab should result in an error being thrown in any security context. https://github.com/w3c/vc-data-model/issues/753
> 
> Note that 3 applies to all VC formats, regardless of the proof / signature format.
> 
> 2) is very easy to fix, just pass a document loader that never makes network requests to any software you want to never make network requests and make sure the software still passes all its tests... 
> 
> 1.) is the most critical imo, different implementations handle this issue differently.
> 
> IMO the correct behavior is to throw when ANY undefined term is detected, and halt immediately. Implementations that silently dropped properties have created a massive security issue for us on this front... and its related to canonicalization, essentially if your canonicalization alg silently drops any information its a security vulnerability... the default behavior of any such algorithm should be to throw.
> 
> There is a kind of pseudo canonicalization that every digital signature system relies on... and it's called a hash function. There are a number of reasons that hash functions are used with digital signatures, and a number of attacks that have results from poor choice of hash functions:
> 
> - https://blog.torproject.org/md5-certificate-collision-attack-and-what-it-means-tor
> - https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
> 
> Yes, there are problems with complexity in the data that is hashed before a signature is applied, but none as deadly as picking a poor hash function.
> 
> in JOSE, what is signed is "base64(json(header)).base64(json(payload))"
> 
> in LD Proofs, what is signed is "sha256(canonicalize(header))sha256(canonicalize(document)) "
> 
> See https://docs.joinmastodon.org/spec/security for another explanation...
> 
> In both cases, the signature algorithm likely hashes this message before signing with EdDSA or ECDSA, etc...
> 
> A couple observations....
> 
> base64 in jose is a form of canonicalizing... because header and payload objects might have different orderings, but base64url encoding makes those orderings opaque... by inflating them 33%.
> 
> canonicalize in the LD Proof could be JCS, or simple sorting of JSON Keys... or RDF Data Set Normalization... each would yield a different signature... 
> 
> mechanically, the fact that JCS exists hints at the problem with JOSE... if you want to sign things, you want stable hashes, and therefore need SOME form of canonicalization for complex data structures.
> 
> JOSE works very well for small id tokens, like the ones that are used in OIDC / OAuth... JOSE totally doesn't scale to signatures over large data sets without another tool.
> 
> "Detached JWS with Unencoded Payload":
> 
> https://tools.ietf.org/html/rfc7515#appendix-F
> https://tools.ietf.org/html/rfc7797
> 
> This is how the JWS for LD Proofs are generated, and the "Unencoded payload part" is the result of the canonicalization algorithm.... 
> 
> What would happen if we just decided to use "Unencoded Payload" without canonicalization?... maybe we just use JSON.stringify?
> 
> it still works!... sorta... now I can generate a new message and signature for every ordering of data in the payload... for a really complex and very large payload, that's going to be a LOT of deeply equal objects... that each yield a different signature... this can lead to storing a massive amount of redundant but indistinguishable data... which can lead to resource exhaustion attacks.
> 
> In fact, the sidetree protocol uses JCS for this exact reason... https://identity.foundation/sidetree/spec/#default-parameters
> 
> So in summary, in any JOSE library you can replace JSON with JCS and get better signatures, and developers will thank you because they won't be tracking down bugs related to duplicate content... and canonicalization can also lead to security issues if not handled properly... regardless of how you canonicalize things.
> 
> Regards,
> 
> OS
> 
> On Tue, Mar 30, 2021 at 1:47 AM David Waite <dwaite@pingidentity.com> wrote: 
> 
> On 3/27/21 11:12 AM, David Chadwick wrote: 
>> This is a major benefit of using JWS/JWT, as canonicalisation has been 
>> fraught with difficulties (as anybody who has worked with XML signatures 
>> will know, and discussions in the IETF PKIX group have highlighted).  
> 
> On Mar 27, 2021, 9:26 AM, Manu Sporny wrote: Anyone who believes that RDF Dataset Canonicalization is the same problem as
> XML Canonicalization does not understand the problem space. These are two very
> different problem spaces with very different solutions. 
> 
> There have been interoperability issues with XML canonicalization, but the impact of those _pale_ in comparison to the security issues. JOSE was adopted as a next step for signed data for many use cases both for interoperability and for security reasons. 
> 
> It is crucially important to remember that for current LD proofs: 
> - the canonicalization algorithm determines which details are critical and which are ignorable 
> - the proof algorithms specify an canonicalization algorithm, there is no guarantee that URDNA2015 will always be the one chosen 
> - JSON-LD is not just for serialization of RDF, but for the interpretation of JSON as RDF. 
> 
> You need security considerations for processing a JSON-encoded document following a successful LD Proof. This is because you did not prove the JSON was integrity-protected, but that the RDF interpretation of the JSON by some canonicalization algorithm (itself an interpretation based on some JSON-LD context) was protected. 
> 
> And these were the problems with XML Signatures and XML Canonicalization. Developers want clean abstractions, and _need_ clean abstractions for security boundaries. Canonicalization and document transformations mean a developer must process the data in the same way as the security layer, lest you have potential security vulnerabilities. 
> 
> I imagine that eventually there will eventually be a desire to separately sign different subsets of the RDF dataset for large datasets (like graph databases), or to support the proof being external to the dataset rather than being represented as part of the dataset, and so on. These complexities in XML canonicalization and signatures introduced security vulnerabilities. Even with correct signature library implementations, the application code interpreting the data did not necessarily rise to the same level of sophistication. 
> 
> JOSE for this reason chose a 'sealed envelope' approach to signing and encryption, where the data is opaque to the security layer and vice-versa. The abstraction isn't in some canonical interpretation of the application data, but that the data is byte-for-byte identical to what was signed. 
> 
> This is why JSON Clear Signatures had so little interest from the JOSE community at large. The problem wasn't that we couldn't imagine a canonicalization of JSON, it was that so many had been burned by all the edge cases that grew out of that flexibility in the past. For those who cared about saving 25%+ of their data cost by wrapping (potentially) binary data in a text-safe format, CBOR/COSE became available. 
> 
> -DW 
> 
> P.S. this is completely ignoring the issues of DNS-style 'poisoning' if you accept data from non-authoritative sources based purely on it being signed, then treat that data as part of a cache or as an update to your own persistent data set. This was an uncommon problem in XML since most XML-based formats did not support embedding external resources. 
> _CONFIDENTIALITY NOTICE: This email may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited.  If you have received this communication in error, please notify the sender immediately by e-mail and delete the message and any file attachments from your computer. Thank you._

  -- 

ORIE STEELE 
Chief Technical Officer 
www.transmute.industries 

 [1]
 

Links:
------
[1] https://www.transmute.industries
Received on Wednesday, 31 March 2021 15:43:49 UTC