Re: [EXTERNAL] Re: The DID service endpoint privacy challenge from Adrian Gropper on 2020-06-30 (public-did-wg@w3.org from June 2020)

From: Adrian Gropper <agropper@healthurl.com>
Date: Tue, 30 Jun 2020 17:51:42 -0400
To: Daniel Buchner <Daniel.Buchner@microsoft.com>
Cc: "daniel.hardman@evernym.com" <daniel.hardman@evernym.com>, "public-did-wg@w3.org" <public-did-wg@w3.org>
Message-ID: <CANYRo8h48sgkQGiD0Hre4re=_5E2COV2pCPsuEyK_BP7CtBM2A@mail.gmail.com>
If I understand this correctly:

   - The mediator business is like the VPN business:
      - chosen by Alice
      - paid by Alice
      - makes no decisions on behalf of Alice (doesn't know any of Alice's
      policies)
      - frequently erases any logs
   - If Alice chooses to change her mediator, links will fail for some
   Requesting Parties (Bob) and they will need to discover Alice's new
   mediator one way or another
   - Bob's message to Alice is just Bob's DID and might have no associated
   service endpoint
   - The mediator sends a Bob's DID to a Service Endpoint in Alice's DID
   document of type "RqP-DID"
   - Alice's RqP-DID endpoint decides, based on policy, whether to send a
   message to Bob, if Bob's DID
      - If Bob's DID has no service endpoint then Alice may need to use a
      discovery service to find another DID for Bob
      - If Bob's DID has a service endpoint, the mediator will see that and
      both Alice and Bob have to hope Alice has chosen an honest mediator
   - DID Core best practice suggests that DIDs have only one service
   endpoint and it points to either a mediator or a policy decision point
      - Alice can choose to offer multiple service endpoints in a DID but
      best practice would say that Alice does that only in a peer DID context
      directly with Bob because Alice trusts Bob not to misuse the unmediated
      endpoints.

- Adrian


On Tue, Jun 30, 2020 at 1:35 PM Daniel Buchner <Daniel.Buchner@microsoft.com>
wrote:

> I will keep this short: I agree with basically everything Daniel just
> said, and to the degree I disagree, it’s probably small enough that it may
> very well be details that don’t have a material effect on how we would
> structure the approach to endpoints/Hubs, etc.
>
>
>
> - Other Daniel
>
>
>
> *From:* Daniel Hardman <daniel.hardman@evernym.com>
> *Sent:* Tuesday, June 30, 2020 10:19 AM
> *To:* public-did-wg@w3.org
> *Cc:* public-did-wg@w3.org
> *Subject:* [EXTERNAL] Re: The DID service endpoint privacy challenge
>
>
>
> TL;DR I think there's a solid, satisfying answer to Adrian's question, and
> it involves savvy application of herd privacy. It may require some subtle
> shifts in expectations, but it does NOT require me to disagree with Daniel
> B about public DIDs, and it doesn't require everybody to become a
> privacy extremist, and it doesn't require commitments to any particular
> ledger or VC tech; it just requires some careful nuance. Hopefully that's
> intriguing enough that you'll read on. :-)
>
> In what follows, I know I'm mixing in some institutional perspective with
> individual perspective, even though Adrian's privacy question is more
> individually focused. Hang with me; they're related.
>
>
>
> 1. I think the phrase "public DID" and its supposed opposite, "private
> DID," are entrapping us in a false dichotomy. It helps me to distinguish
> between "public" and "anywise". "Public" is a statement about intended
> visibility (you want something known and discoverable as broadly as
> possible; "private" is also a statement about visibility and means roughly *but
> not exactly* the opposite). "Anywise," on the other hand, is a statement
> about the intended relationship (you intend to treat any party who
> interacts with you via an anywise DID the same way; it is not interested in
> who the other party might be). Anywise and public often coincide, but not
> always. If you worked for a company with 5,000 employees, and published a
> DID in the company directory, the DID would not be public because it has a
> restricted audience. Yet it would be anywise because you intend for that
> DID to be used the same way by anyone who discovers it (to kickstart a
> relationship). My point is that you can have anywise+public (what we've
> mostly thought of before), but also anywise+private, or something in
> between, like anywise+not-discoverable-but-not-super-private-either. (BTW,
> the opposite of anywise is n-wise or pairwise -- where the meaning imputed
> to the DID is specialized for an enumerated set of others.)
>
> 2. There is a tension between self-sovereignty and discoverability. One of
> the ways you might want to exercise your sovereignty is to make your own
> decisions about discoverability. If we do discoverability the simple way
> (e.g., approximating the listing of a DID in a phone book), you have no
> control over who discovers you. This is FINE for certain use cases. As
> Daniel B points out, I want the world to be able to discover my LinkedIn
> profile. But if it's our only discoverability story, I think we've limited
> our architecture. Joe A made some very astute comments about
> discoverability needing to be separated from the core of the DID problem
> domain a while back, and I remember agreeing with his conclusions. Put a
> bookmark in that for a minute.
>
> 3. I believe we should *publish* with anywise DIDs (e.g., emit a press
> release, issue credentials, say something on Twitter), *be discoverable*
> with them (I'll say more about that in a minute), and *listen
> indiscriminately* with them (like we do when we accept resumes at
> human-resources@acme.com, or when we listen to others publishing at us on
> Twitter). However, I don't believe it's desirable to *specialize our
> interactions* via anywise DIDs; that is contrary to the intent of
> "anywise." Daniel B has argued that our public DIDs are how we'll interact,
> and this is where I diverge from him ever so slightly in my thinking. I
> think public DIDs are how we'll often *start* interacting (and keep
> interacting on Twitter), but not how we'll *keep* interacting.
>
> Today, all of the following patterns are common: A) You open a socket for
> HTTP on port 80 but end up using a redirected socket on a custom port above
> 1024. B) You submit a resume to human-resources@acme.com, and you get
> back a reply from Alice Jones, who's the Acme HR director running a
> particular job search. C) you connect to someone on LinkedIn, and then ask
> them for their contact information so you can carry on a direct
> conversation off the website/app. D) IT departments at enterprises strongly
> steer people to get unique TLS certs for different web servers in the org (
> hr.acme.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhr.acme.com%2F&data=02%7C01%7Cdaniel.buchner%40microsoft.com%7Cb81417cadb974a834b5208d81d19d614%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291344102234380&sdata=QgOxbpWKcpZL3EDUECLFYbaZlTKR6WqOMnUdvLM8iiU%3D&reserved=0>,
> code.acme.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcode.acme.com%2F&data=02%7C01%7Cdaniel.buchner%40microsoft.com%7Cb81417cadb974a834b5208d81d19d614%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291344102234380&sdata=CR0jnWk16FP26ChpnOBpbENh5Ez40AbDQ4Q3Oiobw%2FU%3D&reserved=0>,
> www.acme.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.acme.com%2F&data=02%7C01%7Cdaniel.buchner%40microsoft.com%7Cb81417cadb974a834b5208d81d19d614%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291344102244371&sdata=9xgGCyiGZ%2BpjF%2FxA2RSxA5cYJhGOaFthCjfPP4otV3Q%3D&reserved=0>),
> instead of installing the root certificate for acme.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Facme.com%2F&data=02%7C01%7Cdaniel.buchner%40microsoft.com%7Cb81417cadb974a834b5208d81d19d614%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291344102244371&sdata=%2FuIbUfdF3QtE%2F1mgNp3E0hnzKU7X%2B8x2EPY%2FgWDKyuU%3D&reserved=0>
> on everything.
>
> In future DIDlandia, I predict that similar patterns will emerge.
> Comparing just my final example, IT departments at BigCorp will be very
> averse to putting all institutional cybersecurity eggs in a single
> anywise+public DID basket; instead, they'll want specialized DIDs that
> quickly cease to be anywise, because that limits risk and distributes the
> admin duties for DID keys. The big, general anywise+public DID held by
> BigCorp will be just a gateway or starting point to more specialized DIDs
> that are sometimes anywise and sometimes pairwise, but that are not
> required to be (maybe not desired to be) public.
>
>
>
> 4. Because of the above, I believe the DID usage pattern that will come to
> dominate mainstream usage is: Create an anywise(+public?) DID for
> discoverability, broadcast, and Twitter (anywhere you *remain* in
> generic public mode) -- but as soon as you move from discovery and
> publication to direct bilateral or multilateral conversations, switch to a
> non-anywise DID. Notice that I said "non-anywise" rather than "non-public"
> or "private." It isn't the visibility that's the defining characteristic
> here, and I'm not claiming these must be peer DIDs; what I'm claiming is
> that *we can give up a need for discoverability after we've been
> discovered*. If you intend to use a DID only for Bob, and you and Bob
> have already discovered each other, then you don't need the world to
> discover the handles you use. Maybe the world *can* discover it, or maybe
> you prefer that the world not discover it -- but either way, you certainly
> don't *need* that feature. This means that you can simply give
> pairwise/n-wise DID values to the parties that need them. You may still use
> a ledger for resolution, or you may do something like did:peer or did:key
> to skip the ledger entirely. So I'm not saying something about how you
> communicate the *DID doc*. I'm just talking about the *DID value*.
>
> 5. If the first time you encounter a VC holder's DID is when they prove
> something to you, then you also don't need to discover their holder DID --
> at least not directly and in advance. You just need to resolve it after you
> see it. In the education space, for example, where Kim and friends are
> exploring learner DIDs, these DIDs may or may not be public (visibility
> could vary) -- but they don't need to be discoverable, just resolvable.
>
>
>
> Okay, so this brings me back to Adrian's question about privacy and
> service endpoints.
>
> What if a thousand or a million DIDs shared the same endpoint?
>
> This *could* mean that just learning the endpoint of Alice doesn't tell
> you anything particularly sovereignty- or privacy-destroying about her. But
> how does the endpoint route to Alice, out of all the millions of targets it
> supports?
>
>
>
> The answer is what I said before, about starting anywise (discoverable)
> and switching to n-wise/pairwise (undiscoverable). Anybody in the world can
> discover your LinkedIn handle, but not just anybody can discover the
> private contact info behind that handle. We want the same in DIDlandia.
>
>
>
> How this works in practice is:
>
>    - Alice has an anywise+public DID: A.did@Any. (This notation means
>    Alice's DID at the "Any" relationship). She also has a pairwise DID for her
>    relationship with Carol: A.did@A:C (Alice's DID in the A-to-C
>    relationship.)
>    - Both of these DIDs have the same endpoint. However, the world cannot
>    discover A.did@A:C. (This DID may or may not be highly private. It may
>    or may not be ledger resolvable. Preferably the world can't discover that
>    it exists at all; at a minimum, the world can't look it up in a phonebook
>    that tells them it belongs to Alice.)
>    - There is a mediator that serves an endpoint for Alice and (hundreds,
>    thousands, millions) of other people or orgs. Everyone in the herd probably
>    has lots of DIDs.
>    - When a message arrives for Alice, it is encrypted for either of her
>    DIDs, and it is *also* encrypted for the mediator. This means no party
>    other than the mediator can decrypt its outer envelope, and only Alice can
>    decrypt its inner envelope. So to the world, the only thing that's
>    observable is that a message was transmitted, possibly from a
>    known/observable source, to this shared endpoint. That's it. The
>    destination is not observable.
>    - When the mediator receives the double-wrapped message, it decrypts
>    the outer envelope. This lets it learn the DID of the intended recipient.
>    It can then forward the uncrackable, encrypted inner envelope to either of
>    Alice's DIDs. The mediator is thus slightly more trusted than the public;
>    it can make an association between source and target DIDs. It doesn't have
>    to know which target DIDs belong to Alice, though. (Protecting Alice from a
>    malicious mediator is a deep subject I won't go into here, but there are
>    moderately good ways...)
>    - If Alice is tweeting or needs her resume to be discoverable, she
>    uses A.did@Any. She can publish this. If Alice is an org, that DID can
>    go in the .well_known folder on a website, etc. So now suppose Alice meets
>    Bob at a conference. She's placed A.did@Any on the last slide of her
>    presentation, and Bob captures the QR code and reaches out to her. This
>    "reaching out" means that Bob looks up the published endpoint for
>    A.did@Any (resolution), encrypts a message, using either an anywise
>    DID that he regularly uses, or a new, one-off DID that he allocates, and
>    sends the message to that endpoint. He does a second encryption before he
>    sends, so only Alice's mediator can decrypt the outer envelope.
>    - Alice's mediator relays the message to Alice's A.did@Any, probably
>    serviced by a mobile app she is using, or maybe by some software running on
>    a server (if Alice is an org).
>    - Alice creates a new pairwise DID, A.did@A:B, and sends it back to
>    Bob at the endpoint associated with the DID Bob used. This new DID probably
>    uses the same endpoint as Alice's @Any DID, though it doesn't have to.
>    - Bob can now send messages to Alice at a DID that is used by a
>    massive herd, and nobody will be able to tell he's talking to Alice. Best
>    practice would be for Bob to also rotate his DID at this point, by sending
>    back to Alice a message that says, "Hey, I contacted you before using DID
>    X. That might have been observed. I'm going to switch to using
>    B.did@A:B now." Since Alice is the only party in the world who could
>    decrypt such a message to see the new DID value, this breaks any possible
>    association between Bob's original request and their ongoing conversation.
>
> Now, stepping back from the details, what does this accomplish?
>
>    - Discoverability of DID values, DID endpoints, and possible metadata
>    associated with DIDs is limited crisply to DIDs that are 100% public by
>    intent.
>    - People and orgs can operate publicly, as Daniel has advocated.
>    Nobody has to give up discoverability.
>    - Those same people and orgs can also operate privately or
>    semi-privately. When they do, there is no simple/cheap/trivial way to
>    connect the private part and the public part. (Yes, I know all about
>    correlation engines. Suffice it to say that there's an arms race and we
>    probably won't win it against a state-level actor, but we can create a
>    reasonable firewall against casual correlation, and the strength of the
>    firewall is commensurate with the degree of our investment. More about this
>    here
>    <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.evernym.com%2Fblog%2Fwell-be-correlated-anyway%2F&data=02%7C01%7Cdaniel.buchner%40microsoft.com%7Cb81417cadb974a834b5208d81d19d614%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291344102244371&sdata=Bsr1po8Dhgy%2F3C%2FxD1hpLgWE4H3igNSSR8Vs0461Zj0%3D&reserved=0>
>    .)
>    - None of this behavior has to corrupt the simplicity of DID core. It
>    can all be layered on top.
>
>
>
> On Mon, Jun 29, 2020 at 8:28 AM Dave Longley <dlongley@digitalbazaar.com>
> wrote:
>
>
> On 6/29/20 9:52 AM, Manu Sporny wrote:
> > On 6/29/20 5:14 AM, Adrian Gropper wrote:
> >> If there were only one service endpoint, what would it be and could it
> >> accommodate authentication, authorization, storage, and notification
> >> uses without undue limitation?
> >
> > I believe that this is where Digital Bazaar currently is... that service
> > endpoints advertised in the DID Document are an anti-pattern... we can
> > already see that developers without a background in privacy engineering
> > are unwittingly abusing the field.
> >
> > In the simplest case, it creates large challenges wrt. GDPR and the
> > organizations creating software for or running verifiable credential
> > registries.
> >
> > In many other use cases, it invites abuse (direct link to your personal
> > Identity Hub, web page, email, being some of them).
> >
> > The solution is probably to place a pointer in a DID Document that
> > effectively states: "You can find more information about the DID Subject
> > over here ---> X"... and then to point to somewhere that a caller can
> > see public information in a way that is GDPR compliant (e.g., a list of
> > Verifiable Credentials), or for more advanced use cases, where the
> > caller can authenticate in order to get information that is intended for
> > only a subset of individuals (again, protecting privacy by default).
> >
> > Would anyone object if we took service endpoints in this direction?
> > Effectively, we'd replace them with a "seeAlso" or "moreInformation"
> > link pointing to a list of Verifiable Credentials that would provide
> > information relating to identity hubs, personal data stores, web pages,
> > contact information, and other privacy-sensitive material.
>
> I think it's also important to remember that if you want to "discover" a
> service endpoint from a DID Document, you first needed to have
> "discovered" the DID. How did that happen? In many cases, you had to ask
> for it from the DID controller; in which case you often would have to
> tools to also ask for this "seeAlso"/"moreInformation" service endpoint
> -- and for authorization to read specific information from it as well.
>
> I think that service endpoints that are directly advertised in DID
> Documents only make sense for "public" or "social" DIDs. Even then,
> particularly for DID methods that use DLTs that do not easily support
> deletion, service endpoint information should be expressed elsewhere.
> This points to a need for other decentralized registry services that
> allow for both discovery and deletion. These services would not need to
> be DID-method specific.
>
>
> --
> Dave Longley
> CTO
> Digital Bazaar, Inc.
>
>
Received on Tuesday, 30 June 2020 21:52:09 UTC