Re: On JSON-LD with DIDs and VCs from Joe Andrieu on 2020-01-08 (public-credentials@w3.org from January 2020)

From: Joe Andrieu <joe@legreq.com>
Date: Wed, 08 Jan 2020 15:17:25 -0800
To: "Credentials Community Group" <public-credentials@w3.org>
Message-Id: <e92b42e7-d6bb-4aca-8e17-563c8be4b5c3@www.fastmail.com>
My apologies for bringing in AI. My point was that it would take magical superpowers for resolving the context at verification time to make any sense. Because there aren't magical superpowers, resolving the context at verification time makes no sense.

ALSO, we seem to be conflating concerns about VCs with concerns about DID Documents. In particular, VCs are a formal recommendation. They use JSON-LD. That debate was resolved long ago.

However, an issue under consideration is whether or not JSON-LD is an appropriate mechanism for DID Documents or if JSON is sufficient. This is absolutely in scope for the current working group (and that part of this discussion should probably more to the DID WG).

To that end, Oliver, is there a version of this concern that applies to how a DID Document's @context is an attack vector? I would be better able to respond to that concern if you could describe how that could happen.

If someone wants to propose that VCs move away from JSON-LD, we can have a conversation about that. However, I expect that would be considered beyond "maintenance" and require a full (re)chartering of a working group.

To respond to the VC question in the context of the community group, I believe the attack vector as described is almost certainly a non-issue, for at least two reasons.

First, if the verifier has not seen a specific context before--as a lexical match on the entire URL string or JSON compare of an inline context--it SHOULD treat it as an unknown context and simply stop processing. Only developers are going to actually resolve any context urls. The verifier should just be confirming the contexts are something already programmed for, which means the context documents should also already be cached if you need them for your parser. (As others have mentioned, hashes can confirm that the intended context is the same as the cached version.) So the attack vector describe only applies if the verifier is intentionally treating distinct URLs as identical because they are ignoring query parameters or the like. IMO, this treatment of distinct URLs as identical should be explicitly out of conformance. We may need to add language to clarify that (which would be a suitable errata).

Second, if the issuer and verifier both want correlatable identifiers, we can't stop it. There are at least three ways I can think of off the top of my head:

1. They could use a trailing component in the @context URL *and* the verifier accepts such contexts despite the proposed conformance requirement in the previous paragraph. This, IMO, should be treated as a non-conformant implementation (I don't believe the spec currently addresses this).

2. If we have any extensibility at all, the issuer can just add a property with a UUID or even a "phoneHome URL" that embeds an ID. Colluding verifiers can resolve this URL at verification time just as they would the context URL. If the verifier WANTS to phone home, we can't stop that.

3. The issuer can ALREADY do this attack through the credentialStatus property without extending anything. That property is designed to use non-correlatable means for checking a status, but there's no way to prevent an issuer from setting up a status mechanism that is a correlatable request back to the issuer. In fact, we can expect that this WILL happen. It would be good to win the narrative that this is an privacy anti-pattern, but like option #1, the VC spec is silent on this. We *could* require that the URL be non-correlatable and to my mind that would be a useful errata.

So... this does highlight two potential errata for updating the VC spec to move #1 and #3 to violations of normative requirements. That would be a nice outcome from this thread.

However, #2, is ONLY resolvable if you don't allow ANY extensibility, which I don't believe anyone is arguing for. You have this problem with both JSON and JSON-LD. In either one, if there is any extensibility, issuers can ALWAYS add properties that include correlatable identifiers and phone home service endpoints. Full stop. It has nothing to do with whether or not a context property allows that. 

In short, if you want to avoid these kinds of attacks, we can't have extensibility. Or more rigorously, **I** don't know of an extensibility approach that would let issuers extend a VC without also allowing properties **we** don't like. That's the purpose of extensibility, to enable customizations that have not gone through a standardizations process. That's a reasonable argument against extensibility, but it has nothing to do with the @context property and JSON-LD.

-j

On Wed, Jan 8, 2020, at 8:56 AM, Oliver Terbu wrote:
> I guess I have to clarify a few things because apparently the whole AI thing was misunderstood. Find my comments below.
> 
> On Wed, Jan 8, 2020 at 5:08 PM Manu Sporny <msporny@digitalbazaar.com> wrote:
>> On 1/8/20 6:05 AM, Oliver Terbu wrote:
>>  > On the other hand, I now understand that to solve the namespace 
>>  > problem people are happy to sacrifice security and privacy for 
>>  > extensibility.
>> 
>>  No, that's not what's being said at all. I don't think you understand
>>  what people are saying in this thread.
>> 
>>  People are saying: We don't have to sacrifice anything -- you can get
>>  security, privacy, AND extensibility with Verifiable Credentials as
>>  designed.
> 
> I do think that having JSON-LD-enabled for verifiers has security and privacy implications as described in my previous emails. This security and privacy considerations could have been completely mitigated by not using JSON-LD. Note, I am NOT saying the VC spec is flawed. The spec allows verifiers to decide whether to make use of JSON-LD:
> 
> "Though this specification requires that a @context property be present, it is not required that the value of the @context property be processed using JSON-LD. This is to support processing using plain JSON libraries, such as those that might be used when the verifiable credential is encoded as a JWT."
> 
> However, my question then was, why should verifier use a JSON-LD library at all?
> 
>> 
>> > I am very glad that Joe pointed that out that there is no AI that
>>  > would allow applications to process any variation of credentials just
>>  > because JSON-LD is used. This is exactly what I always said.
>> 
>>  Yes, but no one that knows what they're talking about has ever said
>>  that JSON-LD is a magic bullet that will solve that problem. You seem to
>>  be presenting a strawman argument.
>> 
>>  I also reject the notion that AI can solve this problem... let's not
>>  even talk about that as an option. Every time someone alludes to "AI"
>>  solving anything, I just replace "AI" with "magic". Please stop, AI is
>>  magic. Generative Adversarial Neural Networks specifically tuned to a
>>  particular problem space are not magic, and again, are not a silver
>>  bullet. :)
>> 
>>  We don't need any of that magic for Verifiable Credentials to operate as
>>  designed.
>> 
>>  You seem to be asserting that someone has stated that by using JSON-LD
>>  that they'll be able to *safely* process Verifiable Credentials where
>>  they don't understand the semantics of the credential. If someone has
>>  stated that, they are completely and absolutely wrong. That is magic.
>> 
> 
> Absolutely agree! Please let's don't talk about AI. I never said anything else, or it was just misinterpreted. :)
> 
>> JSON-LD isn't magic that enables you to understand semantics that you
>>  had previously not understood. Software always needs to be written to
>>  understand the semantics -- for the next decade or more, by a human
>>  being that understands the semantics. What JSON-LD gives us is the
>>  ability to precisely identify semantic concepts in a decentralized
>>  manner such that every market vertical on the planet isn't forced
>>  through some slow W3C/IETF/OASIS standards setting process just so that
>>  they can have the Verifiable Credential that their market vertical needs.
> 
> Yes, I do understand that. This is what I referred to as "extensibility" which I do see as a benefit but which I don't see as a legitimate reason to accept any tradeoffs on security and privacy. That is why I'm arguing for JSON-only verifiers.
> 
>> 
>> That is, JSON-LD gives us the ability for people to innovate at the
>>  edges with the types of Verifiable Credentials that are produced and
>>  consumed. JSON-LD *does not* give a computer the ability to magically
>>  understand semantics that it isn't programmed to understand.
> 
> Fully agree and I have never said anything else.
> 
>> 
>> If you think the latter, you fundamentally misunderstand JSON-LD. If you
>>  think the JSON-LD community is espousing the latter, you fundamentally
>>  misunderstand the community's mental model.
>> 
>>  > However, the problem that I described is not about an arbitrary 
>>  > context, it is about the same context under a different URL, or 
>>  > having just an additional meaningless context that serves as a 
>>  > tracking cookie. The JSON-LD spec still allows the retrieval of 
>>  > references to a remote context. Note, the validation checks in the VC
>>  > spec are non-normative, so technically malicious issuers are able to
>>  > abuse that behaviour without producing invalid VCs.
>> 
>>  The Verifiable Credentials spec uses a restricted form of JSON-LD. The
>>  discussion in this thread is about best practices and implementations.
>>  If we find that we all agree on the best practice, then we can update
>>  the spec to contain the limitations we're discussing right now. To put
>>  it another way:
> 
> I would support that and introduce some normative requirements for JSON-LD verifiers for validation checks. 
> 
>> 
>> C, Rust, Java, Python, Javascript, TLS, and JSON parsers all give you a
>>  thousand ways to blow your foot off. That doesn't mean that those specs
>>  are wrong -- they're flexible by design. What separates good programs
>>  that use those technologies from bad ones is that the bad ones blow your
>>  foot off when you don't expect them to, and the good ones protect all
>>  your toes.
> 
> JSON does not have these issues, it is a data interchange format and nothing else. JSON-LD on the other hand defines a lot of characteristics that are not needed. Amongst others, retrieving a remote context, is one of them and that is the reason why we are having this discussion.
> 
>> 
>> We have text that talks about this in the spec, namely:
>> 
>> https://w3c.github.io/vc-data-model/#extensibility
>> 
>>  """
>>  Though this specification requires that a @context property be present,
>>  it is not required that the value of the @context property be processed
>>  using JSON-LD. This is to support processing using plain JSON libraries,
>>  such as those that might be used when the verifiable credential is
>>  encoded as a JWT. All libraries or processors MUST ensure that the order
>>  of the values in the @context property is what is expected for the
>>  specific application. Libraries or processors that support JSON-LD can
>>  process the @context property using full JSON-LD processing as expected.
> 
> I'm aware of this language in the spec and I'm quite ok with that. My point is that why should anyone do anything else as a verifier? Because ignoring JSON-LD as a verifier would result in a more interoperable, more secure and more efficient solution. 
> 
>> 
>> ...
>> 
>>  A dynamic extensibility model such as this does increase the
>>  implementation burden. Software written for such a system has to
>>  determine whether verifiable credentials with extensions are acceptable
>>  based on the risk profile of the application. Some applications might
>>  accept only certain extensions while highly secure environments might
>>  not accept any extensions. These decisions are up to the developers of
>>  these applications and are specifically not the domain of this
>>  specification.
>> 
>>  Developers are urged to ensure that extension JSON-LD contexts are
>>  highly available. Implementations that cannot fetch a context will
>>  produce an error. Strategies for ensuring that extension JSON-LD
>>  contexts are always available include using content-addressed URLs for
>>  contexts, bundling context documents with implementations, or enabling
>>  aggressive caching of contexts.
>>  """
>> 
>>  If that's not good enough, we can improve that text in the future once
>>  the 2020 VCWG spins back up. We can add text to elaborate on this in the
>>  implementation guidance.
> 
> Yes, I would support that.
> 
>> 
>> > If the only thing you need is to identify that a response is a
>>  > certain object, then there are of course simpler solutions even based
>>  > on the current W3C VC spec.
>> 
>>  Simpler solutions, like?
> 
> I'm not against the context in general. For example, you could still use the information provided in the context for that. Note, I was concerned about JSON-LD verifiers and I was not worried about providing a valid context in the VC.
> 
>> 
>> -- manu
>> 
>>  -- 
>>  Manu Sporny (skype: msporny, twitter: manusporny)
>>  Founder/CEO - Digital Bazaar, Inc.
>>  blog: Veres One Decentralized Identifier Blockchain Launches
>> https://tinyurl.com/veres-one-launches
>> 

--
Joe Andrieu, PMP joe@legreq.com
LEGENDARY REQUIREMENTS +1(805)705-8651
Do what matters. http://legreq.com <http://www.legendaryrequirements.com/>
Received on Wednesday, 8 January 2020 23:17:53 UTC