Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof) from Orie Steele on 2021-03-31 (public-credentials@w3.org from March 2021)

From: Orie Steele <orie@transmute.industries>
Date: Wed, 31 Mar 2021 10:02:36 -0500
To: David Waite <dwaite@pingidentity.com>, "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CAN8C-_KtZmyrXfNe2fSkBK_AXb+dB2_FByx3PvkrTWw_Yb8-_A@mail.gmail.com>
Sorry I meant for this reply to David to go to the whole list.

In case I come across as some JOSE hater, I am not... I actually worked
very hard with Mike Jones and others to add JWK support to the DID Core
Spec, and have also worked to make it easy to produce JWS/JWT and LD Proofs
from the same verification material:

https://w3c-ccg.github.io/lds-jws2020/

OS



> On Tue, Mar 30, 2021 at 11:49 PM David Waite <dwaite@pingidentity.com>
> wrote:
>
>>
>> <https://www.pingidentity.com>[image: Ping Identity]
>> <https://www.pingidentity.com>
>> David Waite
>> Principal Technical Architect, CTO Office
>> dwaite@pingidentity.com
>> w: 303 468 2855
>> Connect with us: [image: Glassdoor logo]
>> <https://www.glassdoor.com/Overview/Working-at-Ping-Identity-EI_IE380907.11,24.htm> [image:
>> LinkedIn logo] <https://www.linkedin.com/company/21870> [image: twitter
>> logo] <https://twitter.com/pingidentity> [image: facebook logo]
>> <https://www.facebook.com/pingidentitypage> [image: youtube logo]
>> <https://www.youtube.com/user/PingIdentityTV> [image: Blog logo]
>> <https://www.pingidentity.com/en/blog.html>
>> <https://www.google.com/url?q=https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/faqs/en/consumer-attitudes-post-breach-era-3375.pdf?id%3Db6322a80-f285-11e3-ac10-0800200c9a66&source=gmail&ust=1541693608526000&usg=AFQjCNGBl5cPHCUAVKGZ_NnpuFj5PHGSUQ>
>> <https://www.pingidentity.com/en/events/d/identify-2019.html>
>> <https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/Misc/en/3464-consumersurvey-execsummary.pdf>
>> <https://www.pingidentity.com/en/events/e/rsa.html>
>> <https://www.pingidentity.com/en/events/e/rsa.html>
>> <https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>
>> <https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>
>>
>>
>>
>> On Tue, Mar 30, 2021 at 9:43 AM Orie Steele <orie@transmute.industries>
>> wrote:
>>
>>> Overall I agree with a lot of David's comments.
>>>
>> <snip>
>>
>>> A couple observations....
>>>
>>> base64 in jose is a form of canonicalizing... because header and payload
>>> objects might have different orderings, but base64url encoding makes those
>>> orderings opaque... by inflating them 33%.
>>>
>>
>> Canonicalization means to convert multiple potential representations of
>> equivalent data into a single representation. I would define what JOSE does
>> as straight-up processing transforms. The url-safe base64 encoding protects
>> the data from modification in transport.
>>
>
> agreed, inflating data 33% is clearly not canonicalization.
>
>>
>> You can even turn the b64 encoding step off (RFC 7797) if your payload is
>> already URL safe, or if you are doing detached signatures.
>>
>> canonicalize in the LD Proof could be JCS, or simple sorting of JSON
>>> Keys... or RDF Data Set Normalization... each would yield a different
>>> signature...
>>>
>>
>> Not just that - each would cover a different interpretation of data. Your
>> signature does not prevent abuse from equivalent forms.
>>
>
> I suppose the same problem exists with JOSE, just no standard for how to
> interpret fields that are not registered.
>
>>
>> If you are using LD-Proofs, you either need to process the resulting data
>> _as RDF_ or have additional rules for processing to further lock down any
>> abuses that might come from misinterpreting the RDF because you are looking
>> at it through a manipulated set of JSON-LD lenses.
>>
>
> Here you are asserting that somehow canonicalization destroys information,
> if that were true it would be a problem. If you can't tell if some JSON is
> equivalent to some canonical form, that would also be a problem.
> Luckily both are achievable, with both JCS and RDF DataSet Canonicalization.
>
> I do agree that it's more work to think about canonical information
> representations than it is to inflate a payload 33% and make it url safe...
> it's also more useful for very large datasets.
>
>>
>>
>>> mechanically, the fact that JCS exists hints at the problem with JOSE...
>>> if you want to sign things, you want stable hashes, and therefore need SOME
>>> form of canonicalization for complex data structures.
>>>
>>> JOSE works very well for small id tokens, like the ones that are used in
>>> OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
>>> sets without another tool.
>>>
>>
>> Sure, you are talking about reducing arbitrary subsets of a potentially
>> modified document back to some chosen canonical form and then seeing if
>> there was a pertinent modification. This is what XML DSig was made for :-)
>>
>
> I don't think I am old enough to know what XML DSig is... sounds like it
> was traumatizing : )
>
> If your general point is that schema based languages or types are bad, I
> would say that they increase friction and burden, and that pays off when
> the code base or problem space gets very large... again consider a generic
> solution to strongly typed data in an open world model.
>
>
>>
>> Turns out in a lot of use-cases, that subset is usually "a well defined
>> block of data" and pertinent modifications are usually "any modification
>> whatsoever". Crypto in that case is being used for send-and-receive, or
>> archive-and-restore, and not for doing a verification as part of a larger
>> dataset.
>>
>> When that isn't the case, you have a significantly harder task, such as
>> what is currently in progress as HTTP Message Signatures.
>>
>
> Agreed, HTTP Signatures require canonicalization of the HTTP Request Data
> Structures... because they are complex, and you want to make sure everyone
> is signing things the same way.
>
>
>>
>>> "Detached JWS with Unencoded Payload":
>>>
>>> https://tools.ietf.org/html/rfc7515#appendix-F
>>> https://tools.ietf.org/html/rfc7797
>>>
>>> This is how the JWS for LD Proofs are generated, and the "Unencoded
>>> payload part" is the result of the canonicalization algorithm....
>>>
>>> What would happen if we just decided to use "Unencoded Payload" without
>>> canonicalization?... maybe we just use JSON.stringify?
>>>
>>
>> Intermediaries may do things like convert from LF to CRLF and back, so
>> you would want to keep people treating the data as binary, and make the
>> data behave as binary in transit Exchange IIRC used to change the line
>> encoding of *.txt files _inside ZIP archives_. CRLF is also now considered
>> a grapheme, and will canonicalize down in some unicode tools as well.
>>
>
> I'm not sure I follow fully, but if you are suggesting a binary format
> would be better, I agree, however having worked with COSE a little, I can
> say that binary formats require a significant amount of up front tooling to
> offer the same level of developer experience that JOSE has... despite its
> limitations, JOSE is fairly trivial to implement and to debug.
>
>
>>
>> it still works!... sorta... now I can generate a new message and
>>> signature for every ordering of data in the payload... for a really complex
>>> and very large payload, that's going to be a LOT of deeply equal objects...
>>> that each yield a different signature... this can lead to storing a massive
>>> amount of redundant but indistinguishable data... which can lead to
>>> resource exhaustion attacks.
>>>
>>
>>> In fact, the sidetree protocol uses JCS for this exact reason...
>>> https://identity.foundation/sidetree/spec/#default-parameters
>>>
>>
>> The attacker still has to send all of that redundant data - and they
>> could always make it non-redundant by making any canonical change
>> (including changing the string "José" to "José".)
>>
>> Yes, defense in depth requires validating untrusted user input... IMO
> part of that is asking for canonical representations from users... here is
> another thread on the subject:
>
> https://github.com/matrix-org/matrix-doc/issues/1013
>
>
> So I would consider this more a cache optimization (still important) than
>> an attack solution.
>>
>> So in summary, in any JOSE library you can replace JSON with JCS and get
>>> better signatures, and developers will thank you because they won't be
>>> tracking down bugs related to duplicate content... and canonicalization can
>>> also lead to security issues if not handled properly... regardless of how
>>> you canonicalize things.
>>>
>>
>> I'm not quite sure the scenario of "bugs related to duplicate content" -
>> if you are allowing repeated changes of data, filtering out non-canonical
>> changes is an optimization. Your policy is still apparently to allow a ton
>> of changes to data.
>>
>
> canonicalization helps detect content that can lead to bugs... similar to
> how types and schemas help with that... obviously use case matters here,
> but from a tooling perspective you can use schemas and canonicalization or
> you can decide not too... for some use cases, that decision will yield a
> lot of cost for your engineering team, for others it won't.
>
>>
>> Since you would be using detached signatures, you would necessarily break
>> the semantics of existing deployments and tools. You would have to define
>> the semantics for how to transfer that new data since there are no JWS+JCS
>> formats or best practices. And this would save no data over another
>> JWS+detached JSON transmission format.
>>
>
> https://tools.ietf.org/html/draft-jordan-jws-ct-02
>
> Regarding JSON over the wire, I agreed the only thing that would make JSON
> over the wire worse would be base64url encoding it.... assuming it was
> large JSON.
>
>
>>
>> I particularly think developers in languages such as Rust, Go, and C
>> would be less than excited about the opportunity to be the first to
>> contribute a JCS implementation to their respective platforms. Even less so
>> if they find out they need to build new JSON tooling for strict Ecmascript
>> and I-JSON serialization and conformance.
>>
>
> https://github.com/search?p=2&q=JSON+Canonicalization
>
> Looks like there is support in those languages and more... I suppose those
> languages are already used to being forced to support JSON in order to use
> JOSE.
>
>
>>
>> *CONFIDENTIALITY NOTICE: This email may contain confidential and
>> privileged material for the sole use of the intended recipient(s). Any
>> review, use, distribution or disclosure by others is strictly prohibited.
>> If you have received this communication in error, please notify the sender
>> immediately by e-mail and delete the message and any file attachments from
>> your computer. Thank you.*
>
>
>
> --
> *ORIE STEELE*
> Chief Technical Officer
> www.transmute.industries
>
> <https://www.transmute.industries>
>


-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Wednesday, 31 March 2021 15:04:05 UTC