Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof) from David Waite on 2021-04-01 (public-credentials@w3.org from April 2021)

From: David Waite <dwaite@pingidentity.com>
Date: Wed, 31 Mar 2021 20:47:22 -0600
To: Orie Steele <orie@transmute.industries>
Cc: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CA+3kW=YtSTGe-v1bFe=Zcymjs8gC+Z8cwoBAEtFOXaV1QsnhTw@mail.gmail.com>
*Nod*.

And I'm not a JSON-LD/RDF hater either. I do worry about the level of
complexity being put at a security boundary.

-DW

On Wed, Mar 31, 2021 at 9:02 AM Orie Steele <orie@transmute.industries>
wrote:

> Sorry I meant for this reply to David to go to the whole list.
>
> In case I come across as some JOSE hater, I am not... I actually worked
> very hard with Mike Jones and others to add JWK support to the DID Core
> Spec, and have also worked to make it easy to produce JWS/JWT and LD Proofs
> from the same verification material:
>
> https://w3c-ccg.github.io/lds-jws2020/
>
> OS
>
>
>
>> On Tue, Mar 30, 2021 at 11:49 PM David Waite <dwaite@pingidentity.com>
>> wrote:
>>
>>>
>>> <https://www.pingidentity.com>[image: Ping Identity]
>>> <https://www.pingidentity.com>
>>> David Waite
>>> Principal Technical Architect, CTO Office
>>> dwaite@pingidentity.com
>>> w: 303 468 2855
>>> Connect with us: [image: Glassdoor logo]
>>> <https://www.glassdoor.com/Overview/Working-at-Ping-Identity-EI_IE380907.11,24.htm> [image:
>>> LinkedIn logo] <https://www.linkedin.com/company/21870> [image: twitter
>>> logo] <https://twitter.com/pingidentity> [image: facebook logo]
>>> <https://www.facebook.com/pingidentitypage> [image: youtube logo]
>>> <https://www.youtube.com/user/PingIdentityTV> [image: Blog logo]
>>> <https://www.pingidentity.com/en/blog.html>
>>> <https://www.google.com/url?q=https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/faqs/en/consumer-attitudes-post-breach-era-3375.pdf?id%3Db6322a80-f285-11e3-ac10-0800200c9a66&source=gmail&ust=1541693608526000&usg=AFQjCNGBl5cPHCUAVKGZ_NnpuFj5PHGSUQ>
>>> <https://www.pingidentity.com/en/events/d/identify-2019.html>
>>> <https://www.pingidentity.com/content/dam/ping-6-2-assets/Assets/Misc/en/3464-consumersurvey-execsummary.pdf>
>>> <https://www.pingidentity.com/en/events/e/rsa.html>
>>> <https://www.pingidentity.com/en/events/e/rsa.html>
>>> <https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>
>>> <https://www.gartner.com/reviews/vendor/write/ping-identity/?utm_content=vlp-write&refVal=vlp-ping-identity-32202&utm_campaign=vendor&utm_source=ping-identity&utm_medium=web&arwol=false>
>>>
>>>
>>>
>>> On Tue, Mar 30, 2021 at 9:43 AM Orie Steele <orie@transmute.industries>
>>> wrote:
>>>
>>>> Overall I agree with a lot of David's comments.
>>>>
>>> <snip>
>>>
>>>> A couple observations....
>>>>
>>>> base64 in jose is a form of canonicalizing... because header and
>>>> payload objects might have different orderings, but base64url encoding
>>>> makes those orderings opaque... by inflating them 33%.
>>>>
>>>
>>> Canonicalization means to convert multiple potential representations of
>>> equivalent data into a single representation. I would define what JOSE does
>>> as straight-up processing transforms. The url-safe base64 encoding protects
>>> the data from modification in transport.
>>>
>>
>> agreed, inflating data 33% is clearly not canonicalization.
>>
>>>
>>> You can even turn the b64 encoding step off (RFC 7797) if your payload
>>> is already URL safe, or if you are doing detached signatures.
>>>
>>> canonicalize in the LD Proof could be JCS, or simple sorting of JSON
>>>> Keys... or RDF Data Set Normalization... each would yield a different
>>>> signature...
>>>>
>>>
>>> Not just that - each would cover a different interpretation of data.
>>> Your signature does not prevent abuse from equivalent forms.
>>>
>>
>> I suppose the same problem exists with JOSE, just no standard for how to
>> interpret fields that are not registered.
>>
>>>
>>> If you are using LD-Proofs, you either need to process the resulting
>>> data _as RDF_ or have additional rules for processing to further lock down
>>> any abuses that might come from misinterpreting the RDF because you are
>>> looking at it through a manipulated set of JSON-LD lenses.
>>>
>>
>> Here you are asserting that somehow canonicalization destroys
>> information, if that were true it would be a problem. If you can't tell if
>> some JSON is equivalent to some canonical form, that would also be a
>> problem. Luckily both are achievable, with both JCS and RDF DataSet
>> Canonicalization.
>>
>> I do agree that it's more work to think about canonical information
>> representations than it is to inflate a payload 33% and make it url safe...
>> it's also more useful for very large datasets.
>>
>>>
>>>
>>>> mechanically, the fact that JCS exists hints at the problem with
>>>> JOSE... if you want to sign things, you want stable hashes, and therefore
>>>> need SOME form of canonicalization for complex data structures.
>>>>
>>>> JOSE works very well for small id tokens, like the ones that are used
>>>> in OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
>>>> sets without another tool.
>>>>
>>>
>>> Sure, you are talking about reducing arbitrary subsets of a potentially
>>> modified document back to some chosen canonical form and then seeing if
>>> there was a pertinent modification. This is what XML DSig was made for :-)
>>>
>>
>> I don't think I am old enough to know what XML DSig is... sounds like it
>> was traumatizing : )
>>
>> If your general point is that schema based languages or types are bad, I
>> would say that they increase friction and burden, and that pays off when
>> the code base or problem space gets very large... again consider a generic
>> solution to strongly typed data in an open world model.
>>
>>
>>>
>>> Turns out in a lot of use-cases, that subset is usually "a well defined
>>> block of data" and pertinent modifications are usually "any modification
>>> whatsoever". Crypto in that case is being used for send-and-receive, or
>>> archive-and-restore, and not for doing a verification as part of a larger
>>> dataset.
>>>
>>> When that isn't the case, you have a significantly harder task, such as
>>> what is currently in progress as HTTP Message Signatures.
>>>
>>
>> Agreed, HTTP Signatures require canonicalization of the HTTP Request Data
>> Structures... because they are complex, and you want to make sure everyone
>> is signing things the same way.
>>
>>
>>>
>>>> "Detached JWS with Unencoded Payload":
>>>>
>>>> https://tools.ietf.org/html/rfc7515#appendix-F
>>>> https://tools.ietf.org/html/rfc7797
>>>>
>>>> This is how the JWS for LD Proofs are generated, and the "Unencoded
>>>> payload part" is the result of the canonicalization algorithm....
>>>>
>>>> What would happen if we just decided to use "Unencoded Payload" without
>>>> canonicalization?... maybe we just use JSON.stringify?
>>>>
>>>
>>> Intermediaries may do things like convert from LF to CRLF and back, so
>>> you would want to keep people treating the data as binary, and make the
>>> data behave as binary in transit Exchange IIRC used to change the line
>>> encoding of *.txt files _inside ZIP archives_. CRLF is also now considered
>>> a grapheme, and will canonicalize down in some unicode tools as well.
>>>
>>
>> I'm not sure I follow fully, but if you are suggesting a binary format
>> would be better, I agree, however having worked with COSE a little, I can
>> say that binary formats require a significant amount of up front tooling to
>> offer the same level of developer experience that JOSE has... despite its
>> limitations, JOSE is fairly trivial to implement and to debug.
>>
>>
>>>
>>> it still works!... sorta... now I can generate a new message and
>>>> signature for every ordering of data in the payload... for a really complex
>>>> and very large payload, that's going to be a LOT of deeply equal objects...
>>>> that each yield a different signature... this can lead to storing a massive
>>>> amount of redundant but indistinguishable data... which can lead to
>>>> resource exhaustion attacks.
>>>>
>>>
>>>> In fact, the sidetree protocol uses JCS for this exact reason...
>>>> https://identity.foundation/sidetree/spec/#default-parameters
>>>>
>>>
>>> The attacker still has to send all of that redundant data - and they
>>> could always make it non-redundant by making any canonical change
>>> (including changing the string "José" to "José".)
>>>
>>> Yes, defense in depth requires validating untrusted user input... IMO
>> part of that is asking for canonical representations from users... here is
>> another thread on the subject:
>>
>> https://github.com/matrix-org/matrix-doc/issues/1013
>>
>>
>> So I would consider this more a cache optimization (still important) than
>>> an attack solution.
>>>
>>> So in summary, in any JOSE library you can replace JSON with JCS and get
>>>> better signatures, and developers will thank you because they won't be
>>>> tracking down bugs related to duplicate content... and canonicalization can
>>>> also lead to security issues if not handled properly... regardless of how
>>>> you canonicalize things.
>>>>
>>>
>>> I'm not quite sure the scenario of "bugs related to duplicate content" -
>>> if you are allowing repeated changes of data, filtering out non-canonical
>>> changes is an optimization. Your policy is still apparently to allow a ton
>>> of changes to data.
>>>
>>
>> canonicalization helps detect content that can lead to bugs... similar to
>> how types and schemas help with that... obviously use case matters here,
>> but from a tooling perspective you can use schemas and canonicalization or
>> you can decide not too... for some use cases, that decision will yield a
>> lot of cost for your engineering team, for others it won't.
>>
>>>
>>> Since you would be using detached signatures, you would necessarily
>>> break the semantics of existing deployments and tools. You would have to
>>> define the semantics for how to transfer that new data since there are no
>>> JWS+JCS formats or best practices. And this would save no data over another
>>> JWS+detached JSON transmission format.
>>>
>>
>> https://tools.ietf.org/html/draft-jordan-jws-ct-02
>>
>> Regarding JSON over the wire, I agreed the only thing that would make
>> JSON over the wire worse would be base64url encoding it.... assuming it was
>> large JSON.
>>
>>
>>>
>>> I particularly think developers in languages such as Rust, Go, and C
>>> would be less than excited about the opportunity to be the first to
>>> contribute a JCS implementation to their respective platforms. Even less so
>>> if they find out they need to build new JSON tooling for strict Ecmascript
>>> and I-JSON serialization and conformance.
>>>
>>
>> https://github.com/search?p=2&q=JSON+Canonicalization
>>
>> Looks like there is support in those languages and more... I suppose
>> those languages are already used to being forced to support JSON in order
>> to use JOSE.
>>
>>
>>>
>>> *CONFIDENTIALITY NOTICE: This email may contain confidential and
>>> privileged material for the sole use of the intended recipient(s). Any
>>> review, use, distribution or disclosure by others is strictly prohibited.
>>> If you have received this communication in error, please notify the sender
>>> immediately by e-mail and delete the message and any file attachments from
>>> your computer. Thank you.*
>>
>>
>>
>> --
>> *ORIE STEELE*
>> Chief Technical Officer
>> www.transmute.industries
>>
>> <https://www.transmute.industries>
>>
>
>
> --
> *ORIE STEELE*
> Chief Technical Officer
> www.transmute.industries
>
> <https://www.transmute.industries>
>

-- 
_CONFIDENTIALITY NOTICE: This email may contain confidential and privileged 
material for the sole use of the intended recipient(s). Any review, use, 
distribution or disclosure by others is strictly prohibited.  If you have 
received this communication in error, please notify the sender immediately 
by e-mail and delete the message and any file attachments from your 
computer. Thank you._
Received on Thursday, 1 April 2021 02:48:48 UTC