Re: JSON-LD vs JWT for VC from Dave Longley on 2018-11-02 (public-credentials@w3.org from November 2018)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Fri, 2 Nov 2018 15:35:08 -0400
To: Chris Boscolo <chris@boscolo.net>, Manu Sporny <msporny@digitalbazaar.com>
Cc: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <94408c56-9335-4e75-c1e4-def1cad89636@digitalbazaar.com>
On 11/02/2018 02:39 PM, Chris Boscolo wrote:
> On Fri, Nov 2, 2018 at 9:49 AM Manu Sporny <msporny@digitalbazaar.com 
> <mailto:msporny@digitalbazaar.com>> wrote:
> 
>     On 11/2/18 12:15 PM, Anders Rundgren wrote:
>      > I believe we who work with canonicalization schemes do not follow
>      > here.
> 
>     To be clear, it sounds like the point that you and Chris are making is
>     an argument against COSE, which is the direction the industry is
>     going in.
> 
> 
> To be clear, I have made no comments at all about COSE.
> 
> My view is as follows:
> 
> 1) If we are going to stick with JSON as the data format for VCs, then 
> we should not head down the path that requires the whole world to update 
> to a new JSON parser that supports canonicalize.  Instead, just use 
> JWT/JOSE that is widely used today.
> 
> 2) If we want to use a binary format, then my vote right now would be 
> for CBOR/COSE, but I reserve the right to change this view if presented 
> with compelling information/arguments.
> 
> I would rather go down the CBOR/COSE path than JSON with canonicalization.
> 
>     I'm pretty sure I know what you are saying, but rather than try to
>     restate it, I'd like you and Chris to be more specific about the exact
>     attack you're concerned with (rather than general security principles,
>     of which many of us are aware of).
> 
> 
> Alice:
> Generate VC data()         -> [json_data]
> Canonicalize([jason_data]) -> [packed_json_data]
> Sign([packed_json_data])   -> [sig]
> SendToBob()                -> [packed_json_data + sig]
> 
> Mallory:
> AddExploit()               -> [buffer_madness + packed_json_data + sig]
> 
> Bob:
> Canonicalize([buffer_madness + packed_json_data + sig]) -> 
> [packed_json_data]
>      *BOOM* (modifies just the interal representation)
> VerifySig([packed_json_data])   -> SUCCESS
> Parse([packed_json_data]        -> [corrupt internal representation]

Hmm, I think that the way this is written would not be an attack that
would be different from how JWTs work. If Bob is only parsing the data
that was cryptographically determined to be unmodified (as shown here),
then it doesn't really matter if `canonicalize` was used or not. Or,
rather, you'd have to be highlighting a different attack.

I think that the attack you're talking about here refers to a corrupt
internal representation that does not match Alice's intent. If that is
your intended attack, I think Bob would need to parse the data *prior
to* canonicalizing it, then he'd run that parsed data through a flawed
canonicalization + verifier process that passes the signature check.
Finally, Bob, trusting that the tools weren't broken, would use the
previously parsed data to do whatever processing he wanted and hijinks
would ensue. In other words:

Bob:
json_data = Parse([buffer_madness + packed_json_data])
      * json_data/internal representation is somehow NOT the same as 
what Alice sent but Bob's canonicalizer won't detect it *
Broken_Canonicalize([json_data]) -> [packed_json_data]
      * packed_json_data is now the same as what Alice sent *
VerifySig([packed_json_data])   -> SUCCESS
Use([json_data])        -> [bad news]

This requires that the canonicalizer be misimplemented.

> 
> Instead, the first thing Bob should do is verify the signature of the 
> exact data that Alice produced. (For JWT, this is the Base64 
> representation of the JSON.)  This greatly resduces the surface of 
> attack that Mallory can do.
> 
> I want to state again, this isn't a RIGHT/WRONG argument.  I just prefer 
> a more defenseve style that has a hardended first step of processing 
> that checks the inputs and ensuring the the raw buffer is 
> cryptographically unmodified before starting to parse it. My view is 
> probably biased because I spent several years building an IPSEC stack...
> 
> If the community doesn't agree, then let's just move on.

The fewer the features, the smaller the attack surface area. Of course
there would be even less attack surface if we removed trusting third
party claims altogether!

So, each new feature requires proper implementation of the mechanisms
that enable it or else those mechanisms can be exploited. The real crux
of the matter is whether the features are worth it. I think
understanding the trade offs is what this thread and any related
documentation should be about.

Canonicalization enables some features and we need to do a better job of
documenting them. For example, it pushes certain complexities away from
application developers into tooling layers. This is intended to enable
more independent storage choice, improve debugging experiences, make
multisignatures and chain signatures simpler, and so on. There's a
variety of advantages that one gets -- but to provide them securely
means that the additional tooling (i.e., canonicalization libs) must be
properly implemented.


-- 
Dave Longley
CTO
Digital Bazaar, Inc.
http://digitalbazaar.com
Received on Friday, 2 November 2018 19:35:38 UTC