Re: VC-JWT perma-thread from Jeremy Townson on 2021-03-31 (public-credentials@w3.org from March 2021)

From: Jeremy Townson <jeremy.townson@gmail.com>
Date: Wed, 31 Mar 2021 19:58:33 +0100
To: Nikos Fotiou <fotiou@aueb.gr>
Cc: Orie Steele <orie@transmute.industries>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <CAAic94GFuMzcUGNcb=Y0Dd16eOQMLC=tzj6bDXVM4czdjeWFRw@mail.gmail.com>
Agree with most of that. I would love to see:
* a notion of verifiability which allows the various signature options to
co-exist, whatever their pros and cons.
* a third option which combines binary-level sig formula + non-obfuscated
original message. E.g, a two-line message with a format such as
line 1: <newline stripped VC>
line 2: <proof object for line 1>
or a format where the delimiter is a generated token, to enable retention
of whitespace in the VC.
(not concrete suggestions and perhaps other schemes serve better).

My point is, with regards to the 'what is the ideal future' question raised
by Orie, seems the requirement is for a low-level, secure/fast signature
formula but that keeps the message readable with standard tools (i.e.
console & text editor rather than copy/paste into jwt debugger).

On Wed, 31 Mar 2021 at 16:46, Nikos Fotiou <fotiou@aueb.gr> wrote:

> “What would happen if we just decided to use "Unencoded Payload" without
> canonicalization?... maybe we just use JSON.stringify?
> it still works!... sorta... now I can generate a new message and signature
> for every ordering of data in the payload... for a really complex and very
> large payload, that's going to be a LOT of deeply equal objects... that
> each yield a different signature... this can lead to storing a massive
> amount of redundant but indistinguishable data... which can lead to
> resource exhaustion attacks.”
>
> What baffles me is why this is a requirement in the first place? Why do
> you want to move things around in a VC or a DID document and still have
> valid signatures? IMHO if you have a VC (or a DID document) covered by a
> digital signature then you must not be able to change anything.
>
> As a side note, I am not convinced by any argument related to storage
> space savings, especially when we are talking about so small objects. It is
> 2021 and storage is super cheap, even for small, constrained devices. On
> the other hand, RAM and CPU are not.
>
> Best,
>
> Nikos
>
>
>
> Στις 2021-03-30 18:43, Orie Steele έγραψε:
>
> Overall I agree with a lot of David's comments.
>
> In particular, I have seen the following issues with LD Proofs:
>
> 1. silently dropping terms, instead of throwing an error. (allows an
> attacker to inject certain terms are dropped).
> 2. poor implementations loading contexts over the network (DNS poisoning,
> latency attacks)
> 3. @vocab and other language "features" making it hard to tell what you
> are actually signing
> 4. documentation / controller ship issues with vocab (same problem as
> JOSE, things need to be registered and documented somewhere)
>
> 3) is easy to fix, @vocab should result in an error being thrown in any
> security context. https://github.com/w3c/vc-data-model/issues/753
>
> Note that 3 applies to all VC formats, regardless of the proof / signature
> format.
>
> 2) is very easy to fix, just pass a document loader that never makes
> network requests to any software you want to never make network requests
> and make sure the software still passes all its tests...
>
> 1.) is the most critical imo, different implementations handle this issue
> differently.
>
> IMO the correct behavior is to throw when ANY undefined term is detected,
> and halt immediately. Implementations that silently dropped properties have
> created a massive security issue for us on this front... and its related to
> canonicalization, essentially if your canonicalization alg silently drops
> any information its a security vulnerability... the default behavior of any
> such algorithm should be to throw.
>
> There is a kind of pseudo canonicalization that every digital signature
> system relies on... and it's called a hash function. There are a number of
> reasons that hash functions are used with digital signatures, and a number
> of attacks that have results from poor choice of hash functions:
>
> -
> https://blog.torproject.org/md5-certificate-collision-attack-and-what-it-means-tor
> -
> https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
>
> Yes, there are problems with complexity in the data that is hashed before
> a signature is applied, but none as deadly as picking a poor hash function.
>
> in JOSE, what is signed is "base64(json(header)).base64(json(payload))"
>
> in LD Proofs, what is signed is
> "sha256(canonicalize(header))sha256(canonicalize(document)) "
>
> See https://docs.joinmastodon.org/spec/security for another explanation...
>
> In both cases, the signature algorithm likely hashes this message before
> signing with EdDSA or ECDSA, etc...
>
> A couple observations....
>
> base64 in jose is a form of canonicalizing... because header and payload
> objects might have different orderings, but base64url encoding makes those
> orderings opaque... by inflating them 33%.
>
> canonicalize in the LD Proof could be JCS, or simple sorting of JSON
> Keys... or RDF Data Set Normalization... each would yield a different
> signature...
>
> mechanically, the fact that JCS exists hints at the problem with JOSE...
> if you want to sign things, you want stable hashes, and therefore need SOME
> form of canonicalization for complex data structures.
>
> JOSE works very well for small id tokens, like the ones that are used in
> OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
> sets without another tool.
>
> "Detached JWS with Unencoded Payload":
>
> https://tools.ietf.org/html/rfc7515#appendix-F
> https://tools.ietf.org/html/rfc7797
>
> This is how the JWS for LD Proofs are generated, and the "Unencoded
> payload part" is the result of the canonicalization algorithm....
>
> What would happen if we just decided to use "Unencoded Payload" without
> canonicalization?... maybe we just use JSON.stringify?
>
> it still works!... sorta... now I can generate a new message and signature
> for every ordering of data in the payload... for a really complex and very
> large payload, that's going to be a LOT of deeply equal objects... that
> each yield a different signature... this can lead to storing a massive
> amount of redundant but indistinguishable data... which can lead to
> resource exhaustion attacks.
>
> In fact, the sidetree protocol uses JCS for this exact reason...
> https://identity.foundation/sidetree/spec/#default-parameters
>
> So in summary, in any JOSE library you can replace JSON with JCS and get
> better signatures, and developers will thank you because they won't be
> tracking down bugs related to duplicate content... and canonicalization can
> also lead to security issues if not handled properly... regardless of how
> you canonicalize things.
>
> Regards,
>
> OS
>
>
>
> On Tue, Mar 30, 2021 at 1:47 AM David Waite <dwaite@pingidentity.com>
> wrote:
>
> On 3/27/21 11:12 AM, David Chadwick wrote:
> > This is a major benefit of using JWS/JWT, as canonicalisation has been
> > fraught with difficulties (as anybody who has worked with XML signatures
> > will know, and discussions in the IETF PKIX group have highlighted).
>
> On Mar 27, 2021, 9:26 AM, Manu Sporny wrote:
>
> Anyone who believes that RDF Dataset Canonicalization is the same problem
> as
> XML Canonicalization does not understand the problem space. These are two
> very
> different problem spaces with very different solutions.
>
>
> There have been interoperability issues with XML canonicalization, but the
> impact of those _pale_ in comparison to the security issues. JOSE was
> adopted as a next step for signed data for many use cases both for
> interoperability and for security reasons.
>
> It is crucially important to remember that for current LD proofs:
> - the canonicalization algorithm determines which details are critical and
> which are ignorable
> - the proof algorithms specify an canonicalization algorithm, there is no
> guarantee that URDNA2015 will always be the one chosen
> - JSON-LD is not just for serialization of RDF, but for the interpretation
> of JSON as RDF.
>
> You need security considerations for processing a JSON-encoded document
> following a successful LD Proof. This is because you did not prove the JSON
> was integrity-protected, but that the RDF interpretation of the JSON by
> some canonicalization algorithm (itself an interpretation based on some
> JSON-LD context) was protected.
>
> And these were the problems with XML Signatures and XML Canonicalization.
> Developers want clean abstractions, and _need_ clean abstractions for
> security boundaries. Canonicalization and document transformations mean a
> developer must process the data in the same way as the security layer, lest
> you have potential security vulnerabilities.
>
> I imagine that eventually there will eventually be a desire to separately
> sign different subsets of the RDF dataset for large datasets (like graph
> databases), or to support the proof being external to the dataset rather
> than being represented as part of the dataset, and so on. These
> complexities in XML canonicalization and signatures introduced security
> vulnerabilities. Even with correct signature library implementations, the
> application code interpreting the data did not necessarily rise to the same
> level of sophistication.
>
> JOSE for this reason chose a 'sealed envelope' approach to signing and
> encryption, where the data is opaque to the security layer and vice-versa.
> The abstraction isn't in some canonical interpretation of the application
> data, but that the data is byte-for-byte identical to what was signed.
>
> This is why JSON Clear Signatures had so little interest from the JOSE
> community at large. The problem wasn't that we couldn't imagine a
> canonicalization of JSON, it was that so many had been burned by all the
> edge cases that grew out of that flexibility in the past. For those who
> cared about saving 25%+ of their data cost by wrapping (potentially) binary
> data in a text-safe format, CBOR/COSE became available.
>
> -DW
>
> P.S. this is completely ignoring the issues of DNS-style 'poisoning' if
> you accept data from non-authoritative sources based purely on it being
> signed, then treat that data as part of a cache or as an update to your own
> persistent data set. This was an uncommon problem in XML since most
> XML-based formats did not support embedding external resources.
>
> *CONFIDENTIALITY NOTICE: This email may contain confidential and
> privileged material for the sole use of the intended recipient(s). Any
> review, use, distribution or disclosure by others is strictly prohibited.
> If you have received this communication in error, please notify the sender
> immediately by e-mail and delete the message and any file attachments from
> your computer. Thank you.*
>
>
>
> --
> *ORIE STEELE*
> Chief Technical Officer
> www.transmute.industries
>
> <https://www.transmute.industries>
>
>
Received on Wednesday, 31 March 2021 18:59:02 UTC