Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof) from David Waite on 2021-03-31 (public-credentials@w3.org from March 2021)

From: David Waite <dwaite@pingidentity.com>
Date: Tue, 30 Mar 2021 22:49:22 -0600
To: Orie Steele <orie@transmute.industries>
Cc: Manu Sporny <msporny@digitalbazaar.com>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <CA+3kW=ZPyYCOphJmpdfZett0QPgcF27TVXALBOzDYW-D06AQ7w@mail.gmail.com>
<https://www.pingidentity.com>[image: Ping Identity]
<https://www.pingidentity.com>
David Waite
Principal Technical Architect, CTO Office
dwaite@pingidentity.com
w: 303 468 2855
Connect with us: [image: Glassdoor logo]
<https://www.glassdoor.com/Overview/Working-at-Ping-Identity-EI_IE380907.11,24.htm>
[image:
LinkedIn logo] <https://www.linkedin.com/company/21870> [image: twitter
logo] <https://twitter.com/pingidentity> [image: facebook logo]
<https://www.facebook.com/pingidentitypage> [image: youtube logo]
<https://www.youtube.com/user/PingIdentityTV> [image: Blog logo]
<https://www.pingidentity.com/en/blog.html>
<https://www.google.com/url?q=https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/faqs/en/consumer-attitudes-post-breach-era-3375.pdf?id%3Db6322a80-f285-11e3-ac10-0800200c9a66&source=gmail&ust=1541693608526000&usg=AFQjCNGBl5cPHCUAVKGZ_NnpuFj5PHGSUQ>
<https://www.pingidentity.com/en/events/d/identify-2019.html>
<https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/Misc/en/3464-consumersurvey-execsummary.pdf>
<https://www.pingidentity.com/en/events/e/rsa.html>
<https://www.pingidentity.com/en/events/e/rsa.html>
<https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>
<https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>



On Tue, Mar 30, 2021 at 9:43 AM Orie Steele <orie@transmute.industries>
wrote:

> Overall I agree with a lot of David's comments.
>
<snip>

> A couple observations....
>
> base64 in jose is a form of canonicalizing... because header and payload
> objects might have different orderings, but base64url encoding makes those
> orderings opaque... by inflating them 33%.
>

Canonicalization means to convert multiple potential representations of
equivalent data into a single representation. I would define what JOSE does
as straight-up processing transforms. The url-safe base64 encoding protects
the data from modification in transport.

You can even turn the b64 encoding step off (RFC 7797) if your payload is
already URL safe, or if you are doing detached signatures.

canonicalize in the LD Proof could be JCS, or simple sorting of JSON
> Keys... or RDF Data Set Normalization... each would yield a different
> signature...
>

Not just that - each would cover a different interpretation of data. Your
signature does not prevent abuse from equivalent forms.

If you are using LD-Proofs, you either need to process the resulting data
_as RDF_ or have additional rules for processing to further lock down any
abuses that might come from misinterpreting the RDF because you are looking
at it through a manipulated set of JSON-LD lenses.


> mechanically, the fact that JCS exists hints at the problem with JOSE...
> if you want to sign things, you want stable hashes, and therefore need SOME
> form of canonicalization for complex data structures.
>
> JOSE works very well for small id tokens, like the ones that are used in
> OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
> sets without another tool.
>

Sure, you are talking about reducing arbitrary subsets of a potentially
modified document back to some chosen canonical form and then seeing if
there was a pertinent modification. This is what XML DSig was made for :-)

Turns out in a lot of use-cases, that subset is usually "a well defined
block of data" and pertinent modifications are usually "any modification
whatsoever". Crypto in that case is being used for send-and-receive, or
archive-and-restore, and not for doing a verification as part of a larger
dataset.

When that isn't the case, you have a significantly harder task, such as
what is currently in progress as HTTP Message Signatures.

>
> "Detached JWS with Unencoded Payload":
>
> https://tools.ietf.org/html/rfc7515#appendix-F
> https://tools.ietf.org/html/rfc7797
>
> This is how the JWS for LD Proofs are generated, and the "Unencoded
> payload part" is the result of the canonicalization algorithm....
>
> What would happen if we just decided to use "Unencoded Payload" without
> canonicalization?... maybe we just use JSON.stringify?
>

Intermediaries may do things like convert from LF to CRLF and back, so you
would want to keep people treating the data as binary, and make the data
behave as binary in transit Exchange IIRC used to change the line encoding
of *.txt files _inside ZIP archives_. CRLF is also now considered a
grapheme, and will canonicalize down in some unicode tools as well.

it still works!... sorta... now I can generate a new message and signature
> for every ordering of data in the payload... for a really complex and very
> large payload, that's going to be a LOT of deeply equal objects... that
> each yield a different signature... this can lead to storing a massive
> amount of redundant but indistinguishable data... which can lead to
> resource exhaustion attacks.
>

> In fact, the sidetree protocol uses JCS for this exact reason...
> https://identity.foundation/sidetree/spec/#default-parameters
>

The attacker still has to send all of that redundant data - and they could
always make it non-redundant by making any canonical change (including
changing the string "José" to "José".)

So I would consider this more a cache optimization (still important) than
an attack solution.

So in summary, in any JOSE library you can replace JSON with JCS and get
> better signatures, and developers will thank you because they won't be
> tracking down bugs related to duplicate content... and canonicalization can
> also lead to security issues if not handled properly... regardless of how
> you canonicalize things.
>

I'm not quite sure the scenario of "bugs related to duplicate content" - if
you are allowing repeated changes of data, filtering out non-canonical
changes is an optimization. Your policy is still apparently to allow a ton
of changes to data.

Since you would be using detached signatures, you would necessarily break
the semantics of existing deployments and tools. You would have to define
the semantics for how to transfer that new data since there are no JWS+JCS
formats or best practices. And this would save no data over another
JWS+detached JSON transmission format.

I particularly think developers in languages such as Rust, Go, and C would
be less than excited about the opportunity to be the first to contribute a
JCS implementation to their respective platforms. Even less so if they find
out they need to build new JSON tooling for strict Ecmascript and I-JSON
serialization and conformance.

-- 
_CONFIDENTIALITY NOTICE: This email may contain confidential and privileged 
material for the sole use of the intended recipient(s). Any review, use, 
distribution or disclosure by others is strictly prohibited.  If you have 
received this communication in error, please notify the sender immediately 
by e-mail and delete the message and any file attachments from your 
computer. Thank you._
Received on Wednesday, 31 March 2021 04:49:47 UTC