Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof) from Orie Steele on 2021-03-30 (public-credentials@w3.org from March 2021)

From: Orie Steele <orie@transmute.industries>
Date: Tue, 30 Mar 2021 10:43:32 -0500
To: David Waite <dwaite@pingidentity.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <CAN8C-_+dxuc2VZ4Hvoe7U6QF8vWbF1SyQR1saxMcdPP9nUCYFQ@mail.gmail.com>
Overall I agree with a lot of David's comments.

In particular, I have seen the following issues with LD Proofs:

1. silently dropping terms, instead of throwing an error. (allows an
attacker to inject certain terms are dropped).
2. poor implementations loading contexts over the network (DNS poisoning,
latency attacks)
3. @vocab and other language "features" making it hard to tell what you are
actually signing
4. documentation / controller ship issues with vocab (same problem as JOSE,
things need to be registered and documented somewhere)

3) is easy to fix, @vocab should result in an error being thrown in any
security context. https://github.com/w3c/vc-data-model/issues/753

Note that 3 applies to all VC formats, regardless of the proof / signature
format.

2) is very easy to fix, just pass a document loader that never makes
network requests to any software you want to never make network requests
and make sure the software still passes all its tests...

1.) is the most critical imo, different implementations handle this issue
differently.

IMO the correct behavior is to throw when ANY undefined term is detected,
and halt immediately. Implementations that silently dropped properties have
created a massive security issue for us on this front... and its related to
canonicalization, essentially if your canonicalization alg silently drops
any information its a security vulnerability... the default behavior of any
such algorithm should be to throw.

There is a kind of pseudo canonicalization that every digital signature
system relies on... and it's called a hash function. There are a number of
reasons that hash functions are used with digital signatures, and a number
of attacks that have results from poor choice of hash functions:

-
https://blog.torproject.org/md5-certificate-collision-attack-and-what-it-means-tor
-
https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/

Yes, there are problems with complexity in the data that is hashed before a
signature is applied, but none as deadly as picking a poor hash function.

in JOSE, what is signed is "base64(json(header)).base64(json(payload))"

in LD Proofs, what is signed is
"sha256(canonicalize(header))sha256(canonicalize(document)) "

See https://docs.joinmastodon.org/spec/security for another explanation...

In both cases, the signature algorithm likely hashes this message before
signing with EdDSA or ECDSA, etc...

A couple observations....

base64 in jose is a form of canonicalizing... because header and payload
objects might have different orderings, but base64url encoding makes those
orderings opaque... by inflating them 33%.

canonicalize in the LD Proof could be JCS, or simple sorting of JSON
Keys... or RDF Data Set Normalization... each would yield a different
signature...

mechanically, the fact that JCS exists hints at the problem with JOSE... if
you want to sign things, you want stable hashes, and therefore need SOME
form of canonicalization for complex data structures.

JOSE works very well for small id tokens, like the ones that are used in
OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
sets without another tool.

"Detached JWS with Unencoded Payload":

https://tools.ietf.org/html/rfc7515#appendix-F
https://tools.ietf.org/html/rfc7797

This is how the JWS for LD Proofs are generated, and the "Unencoded payload
part" is the result of the canonicalization algorithm....

What would happen if we just decided to use "Unencoded Payload" without
canonicalization?... maybe we just use JSON.stringify?

it still works!... sorta... now I can generate a new message and signature
for every ordering of data in the payload... for a really complex and very
large payload, that's going to be a LOT of deeply equal objects... that
each yield a different signature... this can lead to storing a massive
amount of redundant but indistinguishable data... which can lead to
resource exhaustion attacks.

In fact, the sidetree protocol uses JCS for this exact reason...
https://identity.foundation/sidetree/spec/#default-parameters

So in summary, in any JOSE library you can replace JSON with JCS and get
better signatures, and developers will thank you because they won't be
tracking down bugs related to duplicate content... and canonicalization can
also lead to security issues if not handled properly... regardless of how
you canonicalize things.

Regards,

OS



On Tue, Mar 30, 2021 at 1:47 AM David Waite <dwaite@pingidentity.com> wrote:

> On 3/27/21 11:12 AM, David Chadwick wrote:
> > This is a major benefit of using JWS/JWT, as canonicalisation has been
> > fraught with difficulties (as anybody who has worked with XML signatures
> > will know, and discussions in the IETF PKIX group have highlighted).
>
> On Mar 27, 2021, 9:26 AM, Manu Sporny wrote:
>
>> Anyone who believes that RDF Dataset Canonicalization is the same problem
>> as
>> XML Canonicalization does not understand the problem space. These are two
>> very
>> different problem spaces with very different solutions.
>
>
> There have been interoperability issues with XML canonicalization, but the
> impact of those _pale_ in comparison to the security issues. JOSE was
> adopted as a next step for signed data for many use cases both for
> interoperability and for security reasons.
>
> It is crucially important to remember that for current LD proofs:
> - the canonicalization algorithm determines which details are critical and
> which are ignorable
> - the proof algorithms specify an canonicalization algorithm, there is no
> guarantee that URDNA2015 will always be the one chosen
> - JSON-LD is not just for serialization of RDF, but for the interpretation
> of JSON as RDF.
>
> You need security considerations for processing a JSON-encoded document
> following a successful LD Proof. This is because you did not prove the JSON
> was integrity-protected, but that the RDF interpretation of the JSON by
> some canonicalization algorithm (itself an interpretation based on some
> JSON-LD context) was protected.
>
> And these were the problems with XML Signatures and XML Canonicalization.
> Developers want clean abstractions, and _need_ clean abstractions for
> security boundaries. Canonicalization and document transformations mean a
> developer must process the data in the same way as the security layer, lest
> you have potential security vulnerabilities.
>
> I imagine that eventually there will eventually be a desire to separately
> sign different subsets of the RDF dataset for large datasets (like graph
> databases), or to support the proof being external to the dataset rather
> than being represented as part of the dataset, and so on. These
> complexities in XML canonicalization and signatures introduced security
> vulnerabilities. Even with correct signature library implementations, the
> application code interpreting the data did not necessarily rise to the same
> level of sophistication.
>
> JOSE for this reason chose a 'sealed envelope' approach to signing and
> encryption, where the data is opaque to the security layer and vice-versa.
> The abstraction isn't in some canonical interpretation of the application
> data, but that the data is byte-for-byte identical to what was signed.
>
> This is why JSON Clear Signatures had so little interest from the JOSE
> community at large. The problem wasn't that we couldn't imagine a
> canonicalization of JSON, it was that so many had been burned by all the
> edge cases that grew out of that flexibility in the past. For those who
> cared about saving 25%+ of their data cost by wrapping (potentially) binary
> data in a text-safe format, CBOR/COSE became available.
>
> -DW
>
> P.S. this is completely ignoring the issues of DNS-style 'poisoning' if
> you accept data from non-authoritative sources based purely on it being
> signed, then treat that data as part of a cache or as an update to your own
> persistent data set. This was an uncommon problem in XML since most
> XML-based formats did not support embedding external resources.
>
> *CONFIDENTIALITY NOTICE: This email may contain confidential and
> privileged material for the sole use of the intended recipient(s). Any
> review, use, distribution or disclosure by others is strictly prohibited.
> If you have received this communication in error, please notify the sender
> immediately by e-mail and delete the message and any file attachments from
> your computer. Thank you.*



-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Tuesday, 30 March 2021 15:43:58 UTC