Re: On JSON-LD with DIDs and VCs from Oliver Terbu on 2020-01-09 (public-credentials@w3.org from January 2020)

From: Oliver Terbu <oliver.terbu@consensys.net>
Date: Thu, 9 Jan 2020 13:58:18 +0100
To: Daniel Hardman <daniel.hardman@evernym.com>
Cc: Joe Andrieu <joe@legreq.com>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <CALu3yZLgLhazWD1=TkT-u3qQT7=gdLqCkq6399bUEm2OH91Prg@mail.gmail.com>
To all, thank you very much for providing all the information in this
thread. Also to Melvin and again to Orie who wrote the article.

Joe, I'm not asking for rechartering the VC WG charter. And I agree with
your points, so +1.

Though I want to highlight that these categories are also within scope of
different work items / concerns. For this reason, we should find out where
to address these concerns.

1. "phone home" via context, e.g., remote context retrieval
-> related to the general processing of data (JSON-LD vs plain JSON) which
is in scope of the "VC spec + related documents" such as implementation
guide. I saw that the implementation guide already provides explicit
language that it is not recommended to retrieve a remote context:
"it is recommended that all Verifiable Credential consumer software hard
code the @context values it understands and not reach out to the Web to
fetch them." My apologies, I missed that sentence. So, I believe that
addresses my concern although it does not describe the concern. We could
add more language to the implementation guide that describes why that could
have privacy implications. Furthermore, I would support normative language
in an errata in the future that treats that as non-conformant
implementations.

2. "phone home" via information provided by a custom property, e.g., remote
URL for a profile picture
-> application or context specific and that cannot be prevented. I fully
agree that extensibility is necessary. It is up to the risk profile of the
verifier if such features are used.

3. "phone home" via status registry
-> related to status method registry and specification; we can come up with
rubrics for status registries; we could agree on some minimum requirements
that prevent a 1:1 mapping of status requests onto VCs and include that in
an errata; we could add this to the VC extension registry document where
status registries are referenced. In general, I don't believe these
measures would be too strict and I don't think that would exclude any
community. I don't have a strong preference where we should add this.

My personal conclusion on the JSON-LD-enabled verifiers question is that
they are ok :) although they have a different risk profile. And as Daniel
pointed out, people have different requirements which lead to different
tradeoffs. I guess, if an application does not make use of JSON-LD, then
that application won't do so to process a VC (except validation of the
normative sections of the VC spec) . If an application has already bought
in to the Linked Data mental model, e.g., by leveraging data that is
already there, then they would continue to do so.

Joe, I will follow up with an email later that provides an example of this
attack using DIDs.

Thanks,
Oliver



On Thu, Jan 9, 2020 at 1:11 AM Daniel Hardman <daniel.hardman@evernym.com>
wrote:

> +1 to all of Joe's comments.
>
> On Wed, Jan 8, 2020 at 4:19 PM Joe Andrieu <joe@legreq.com> wrote:
>
>> My apologies for bringing in AI. My point was that it would take magical
>> superpowers for resolving the context at verification time to make any
>> sense. Because there aren't magical superpowers, resolving the context at
>> verification time makes no sense.
>>
>> ALSO, we seem to be conflating concerns about VCs with concerns about DID
>> Documents. In particular, VCs are a formal recommendation. They use
>> JSON-LD. That debate was resolved long ago.
>>
>> However, an issue under consideration is whether or not JSON-LD is an
>> appropriate mechanism for DID Documents or if JSON is sufficient. This is
>> absolutely in scope for the current working group (and that part of this
>> discussion should probably more to the DID WG).
>>
>> To that end, Oliver, is there a version of this concern that applies to
>> how a DID Document's @context is an attack vector? I would be better able
>> to respond to that concern if you could describe how that could happen.
>>
>> If someone wants to propose that VCs move away from JSON-LD, we can have
>> a conversation about that. However, I expect that would be considered
>> beyond "maintenance" and require a full (re)chartering of a working group.
>>
>> To respond to the VC question in the context of the community group, I
>> believe the attack vector as described is almost certainly a non-issue, for
>> at least two reasons.
>>
>> First, if the verifier has not seen a specific context before--as a
>> lexical match on the entire URL string or JSON compare of an inline
>> context--it SHOULD treat it as an unknown context and simply stop
>> processing. Only developers are going to actually resolve any context urls.
>> The verifier should just be confirming the contexts are something already
>> programmed for, which means the context documents should also already be
>> cached if you need them for your parser. (As others have mentioned, hashes
>> can confirm that the intended context is the same as the cached version.)
>> So the attack vector describe only applies if the verifier is intentionally
>> treating distinct URLs as identical because they are ignoring query
>> parameters or the like. IMO, this treatment of distinct URLs as identical
>> should be explicitly out of conformance. We may need to add language to
>> clarify that (which would be a suitable errata).
>>
>> Second, if the issuer and verifier both want correlatable identifiers, we
>> can't stop it. There are at least three ways I can think of off the top of
>> my head:
>>
>> 1. They could use a trailing component in the @context URL *and* the
>> verifier accepts such contexts despite the proposed conformance requirement
>> in the previous paragraph. This, IMO, should be treated as a non-conformant
>> implementation (I don't believe the spec currently addresses this).
>>
>> 2. If we have any extensibility at all, the issuer can just add a
>> property with a UUID or even a "phoneHome URL" that embeds an ID. Colluding
>> verifiers can resolve this URL at verification time just as they would the
>> context URL. If the verifier WANTS to phone home, we can't stop that.
>>
>> 3. The issuer can ALREADY do this attack through the credentialStatus
>> property without extending anything. That property is designed to use
>> non-correlatable means for checking a status, but there's no way to prevent
>> an issuer from setting up a status mechanism that is a correlatable request
>> back to the issuer. In fact, we can expect that this WILL happen. It would
>> be good to win the narrative that this is an privacy anti-pattern, but like
>> option #1, the VC spec is silent on this. We *could* require that the URL
>> be non-correlatable and to my mind that would be a useful errata.
>>
>> So... this does highlight two potential errata for updating the VC spec
>> to move #1 and #3 to violations of normative requirements. That would be a
>> nice outcome from this thread.
>>
>> However, #2, is ONLY resolvable if you don't allow ANY extensibility,
>> which I don't believe anyone is arguing for. You have this problem with
>> both JSON and JSON-LD. In either one, if there is any extensibility,
>> issuers can ALWAYS add properties that include correlatable identifiers and
>> phone home service endpoints. Full stop. It has nothing to do with whether
>> or not a context property allows that.
>>
>> In short, if you want to avoid these kinds of attacks, we can't have
>> extensibility. Or more rigorously, **I** don't know of an extensibility
>> approach that would let issuers extend a VC without also allowing
>> properties **we** don't like. That's the purpose of extensibility, to
>> enable customizations that have not gone through a standardizations
>> process. That's a reasonable argument against extensibility, but it has
>> nothing to do with the @context property and JSON-LD.
>>
>> -j
>>
>> On Wed, Jan 8, 2020, at 8:56 AM, Oliver Terbu wrote:
>>
>> I guess I have to clarify a few things because apparently the whole AI
>> thing was misunderstood. Find my comments below.
>>
>> On Wed, Jan 8, 2020 at 5:08 PM Manu Sporny <msporny@digitalbazaar.com>
>> wrote:
>>
>> On 1/8/20 6:05 AM, Oliver Terbu wrote:
>> > On the other hand, I now understand that to solve the namespace
>> > problem people are happy to sacrifice security and privacy for
>> > extensibility.
>>
>> No, that's not what's being said at all. I don't think you understand
>> what people are saying in this thread.
>>
>> People are saying: We don't have to sacrifice anything -- you can get
>> security, privacy, AND extensibility with Verifiable Credentials as
>> designed.
>>
>>
>> I do think that having JSON-LD-enabled for verifiers has security and
>> privacy implications as described in my previous emails. This security and
>> privacy considerations could have been completely mitigated by not using
>> JSON-LD. Note, I am NOT saying the VC spec is flawed. The spec allows
>> verifiers to decide whether to make use of JSON-LD:
>>
>> "Though this specification requires that a @context property be present,
>> it is not required that the value of the @context property be processed
>> using JSON-LD. This is to support processing using plain JSON libraries,
>> such as those that might be used when the verifiable credential is encoded
>> as a JWT."
>>
>> However, my question then was, why should verifier use a JSON-LD library
>> at all?
>>
>>
>>
>> > I am very glad that Joe pointed that out that there is no AI that
>> > would allow applications to process any variation of credentials just
>> > because JSON-LD is used. This is exactly what I always said.
>>
>> Yes, but no one that knows what they're talking about has ever said
>> that JSON-LD is a magic bullet that will solve that problem. You seem to
>> be presenting a strawman argument.
>>
>> I also reject the notion that AI can solve this problem... let's not
>> even talk about that as an option. Every time someone alludes to "AI"
>> solving anything, I just replace "AI" with "magic". Please stop, AI is
>> magic. Generative Adversarial Neural Networks specifically tuned to a
>> particular problem space are not magic, and again, are not a silver
>> bullet. :)
>>
>> We don't need any of that magic for Verifiable Credentials to operate as
>> designed.
>>
>> You seem to be asserting that someone has stated that by using JSON-LD
>> that they'll be able to *safely* process Verifiable Credentials where
>> they don't understand the semantics of the credential. If someone has
>> stated that, they are completely and absolutely wrong. That is magic.
>>
>>
>> Absolutely agree! Please let's don't talk about AI. I never said anything
>> else, or it was just misinterpreted. :)
>>
>>
>> JSON-LD isn't magic that enables you to understand semantics that you
>> had previously not understood. Software always needs to be written to
>> understand the semantics -- for the next decade or more, by a human
>> being that understands the semantics. What JSON-LD gives us is the
>> ability to precisely identify semantic concepts in a decentralized
>> manner such that every market vertical on the planet isn't forced
>> through some slow W3C/IETF/OASIS standards setting process just so that
>> they can have the Verifiable Credential that their market vertical needs.
>>
>>
>> Yes, I do understand that. This is what I referred to as "extensibility"
>> which I do see as a benefit but which I don't see as a legitimate reason to
>> accept any tradeoffs on security and privacy. That is why I'm arguing for
>> JSON-only verifiers.
>>
>>
>>
>> That is, JSON-LD gives us the ability for people to innovate at the
>> edges with the types of Verifiable Credentials that are produced and
>> consumed. JSON-LD *does not* give a computer the ability to magically
>> understand semantics that it isn't programmed to understand.
>>
>>
>> Fully agree and I have never said anything else.
>>
>>
>>
>> If you think the latter, you fundamentally misunderstand JSON-LD. If you
>> think the JSON-LD community is espousing the latter, you fundamentally
>> misunderstand the community's mental model.
>>
>> > However, the problem that I described is not about an arbitrary
>> > context, it is about the same context under a different URL, or
>> > having just an additional meaningless context that serves as a
>> > tracking cookie. The JSON-LD spec still allows the retrieval of
>> > references to a remote context. Note, the validation checks in the VC
>> > spec are non-normative, so technically malicious issuers are able to
>> > abuse that behaviour without producing invalid VCs.
>>
>> The Verifiable Credentials spec uses a restricted form of JSON-LD. The
>> discussion in this thread is about best practices and implementations.
>> If we find that we all agree on the best practice, then we can update
>> the spec to contain the limitations we're discussing right now. To put
>> it another way:
>>
>>
>> I would support that and introduce some normative requirements for
>> JSON-LD verifiers for validation checks.
>>
>>
>>
>> C, Rust, Java, Python, Javascript, TLS, and JSON parsers all give you a
>> thousand ways to blow your foot off. That doesn't mean that those specs
>> are wrong -- they're flexible by design. What separates good programs
>> that use those technologies from bad ones is that the bad ones blow your
>> foot off when you don't expect them to, and the good ones protect all
>> your toes.
>>
>>
>> JSON does not have these issues, it is a data interchange format and
>> nothing else. JSON-LD on the other hand defines a lot of characteristics
>> that are not needed. Amongst others, retrieving a remote context, is one of
>> them and that is the reason why we are having this discussion.
>>
>>
>>
>> We have text that talks about this in the spec, namely:
>>
>> https://w3c.github.io/vc-data-model/#extensibility
>>
>> """
>> Though this specification requires that a @context property be present,
>> it is not required that the value of the @context property be processed
>> using JSON-LD. This is to support processing using plain JSON libraries,
>> such as those that might be used when the verifiable credential is
>> encoded as a JWT. All libraries or processors MUST ensure that the order
>> of the values in the @context property is what is expected for the
>> specific application. Libraries or processors that support JSON-LD can
>> process the @context property using full JSON-LD processing as expected.
>>
>>
>> I'm aware of this language in the spec and I'm quite ok with that. My
>> point is that why should anyone do anything else as a verifier? Because
>> ignoring JSON-LD as a verifier would result in a more interoperable, more
>> secure and more efficient solution.
>>
>>
>>
>> ...
>>
>> A dynamic extensibility model such as this does increase the
>> implementation burden. Software written for such a system has to
>> determine whether verifiable credentials with extensions are acceptable
>> based on the risk profile of the application. Some applications might
>> accept only certain extensions while highly secure environments might
>> not accept any extensions. These decisions are up to the developers of
>> these applications and are specifically not the domain of this
>> specification.
>>
>> Developers are urged to ensure that extension JSON-LD contexts are
>> highly available. Implementations that cannot fetch a context will
>> produce an error. Strategies for ensuring that extension JSON-LD
>> contexts are always available include using content-addressed URLs for
>> contexts, bundling context documents with implementations, or enabling
>> aggressive caching of contexts.
>> """
>>
>> If that's not good enough, we can improve that text in the future once
>> the 2020 VCWG spins back up. We can add text to elaborate on this in the
>> implementation guidance.
>>
>>
>> Yes, I would support that.
>>
>>
>>
>> > If the only thing you need is to identify that a response is a
>> > certain object, then there are of course simpler solutions even based
>> > on the current W3C VC spec.
>>
>> Simpler solutions, like?
>>
>>
>> I'm not against the context in general. For example, you could still use
>> the information provided in the context for that. Note, I was
>> concerned about JSON-LD verifiers and I was not worried about providing a
>> valid context in the VC.
>>
>>
>>
>> -- manu
>>
>> --
>> Manu Sporny (skype: msporny, twitter: manusporny)
>> Founder/CEO - Digital Bazaar, Inc.
>> blog: Veres One Decentralized Identifier Blockchain Launches
>> https://tinyurl.com/veres-one-launches
>>
>>
>> --
>> Joe Andrieu, PMP
>>                      joe@legreq.com
>> LEGENDARY REQUIREMENTS
>>      +1(805)705-8651
>> Do what matters.
>>                    http://legreq.com
>> <http://www.legendaryrequirements.com>
>>
>>
>>
Received on Thursday, 9 January 2020 12:58:34 UTC