RE: [EXTERNAL] Re: The DID service endpoint privacy challenge from Daniel Buchner on 2020-06-30 (public-did-wg@w3.org from June 2020)

From: Daniel Buchner <Daniel.Buchner@microsoft.com>
Date: Tue, 30 Jun 2020 23:00:03 +0000
To: Adrian Gropper <agropper@healthurl.com>
CC: "daniel.hardman@evernym.com" <daniel.hardman@evernym.com>, "public-did-wg@w3.org" <public-did-wg@w3.org>
Message-ID: <CY1PR00MB0155EC56808D214F616213A9816F0@CY1PR00MB0155.namprd00.prod.outlook.com>
Questions/comments inline:

From: Adrian Gropper <agropper@healthurl.com>
Sent: Tuesday, June 30, 2020 2:52 PM
To: Daniel Buchner <Daniel.Buchner@microsoft.com>
Cc: daniel.hardman@evernym.com; public-did-wg@w3.org
Subject: Re: [EXTERNAL] Re: The DID service endpoint privacy challenge

If I understand this correctly:

>>> When we say mediator, do we mean any party that is given an encrypted payload without the ability to decrypt the secret portion of the payload? Is this term meant to include a remote personal datastore instance host?

  *   The mediator business is like the VPN business:

     *   chosen by Alice
     *   paid by Alice
     *   makes no decisions on behalf of Alice (doesn't know any of Alice's policies)
     *   frequently erases any logs
>>> This all sounds like it could be true, save the encrypted data they would retain if they were acting as the host of the user’s remote personal datastore instance

  *   If Alice chooses to change her mediator, links will fail for some Requesting Parties (Bob) and they will need to discover Alice's new mediator one way or another
>>> Why would links fail if Alice can rotate in new endpoints to her DID Doc? She should be able to add/change service endpoints and have resolving parties auto-detect this.

  *   Bob's message to Alice is just Bob's DID and might have no associated service endpoint
  *   The mediator sends a Bob's DID to a Service Endpoint in Alice's DID document of type "RqP-DID"
  *   Alice's RqP-DID endpoint decides, based on policy, whether to send a message to Bob, if Bob's DID

     *   If Bob's DID has no service endpoint then Alice may need to use a discovery service to find another DID for Bob
     *   If Bob's DID has a service endpoint, the mediator will see that and both Alice and Bob have to hope Alice has chosen an honest mediator

  *   DID Core best practice suggests that DIDs have only one service endpoint and it points to either a mediator or a policy decision point

     *   Alice can choose to offer multiple service endpoints in a DID but best practice would say that Alice does that only in a peer DID context directly with Bob because Alice trusts Bob not to misuse the unmediated endpoints.
>>> I agree with the best practice of having only one type of service endpoint, but from a different angle: if we don’t all agree on one universal standard for personal datastores, we won’t be hardly as successful at creating a solid, ubiquitous foundation for decentralized apps and services.

- Adrian


On Tue, Jun 30, 2020 at 1:35 PM Daniel Buchner <Daniel.Buchner@microsoft.com<mailto:Daniel.Buchner@microsoft.com>> wrote:
I will keep this short: I agree with basically everything Daniel just said, and to the degree I disagree, it’s probably small enough that it may very well be details that don’t have a material effect on how we would structure the approach to endpoints/Hubs, etc.

- Other Daniel

From: Daniel Hardman <daniel.hardman@evernym.com<mailto:daniel.hardman@evernym.com>>
Sent: Tuesday, June 30, 2020 10:19 AM
To: public-did-wg@w3.org<mailto:public-did-wg@w3.org>
Cc: public-did-wg@w3.org<mailto:public-did-wg@w3.org>
Subject: [EXTERNAL] Re: The DID service endpoint privacy challenge

TL;DR I think there's a solid, satisfying answer to Adrian's question, and it involves savvy application of herd privacy. It may require some subtle shifts in expectations, but it does NOT require me to disagree with Daniel B about public DIDs, and it doesn't require everybody to become a privacy extremist, and it doesn't require commitments to any particular ledger or VC tech; it just requires some careful nuance. Hopefully that's intriguing enough that you'll read on. :-)

In what follows, I know I'm mixing in some institutional perspective with individual perspective, even though Adrian's privacy question is more individually focused. Hang with me; they're related.

1. I think the phrase "public DID" and its supposed opposite, "private DID," are entrapping us in a false dichotomy. It helps me to distinguish between "public" and "anywise". "Public" is a statement about intended visibility (you want something known and discoverable as broadly as possible; "private" is also a statement about visibility and means roughly but not exactly the opposite). "Anywise," on the other hand, is a statement about the intended relationship (you intend to treat any party who interacts with you via an anywise DID the same way; it is not interested in who the other party might be). Anywise and public often coincide, but not always. If you worked for a company with 5,000 employees, and published a DID in the company directory, the DID would not be public because it has a restricted audience. Yet it would be anywise because you intend for that DID to be used the same way by anyone who discovers it (to kickstart a relationship). My point is that you can have anywise+public (what we've mostly thought of before), but also anywise+private, or something in between, like anywise+not-discoverable-but-not-super-private-either. (BTW, the opposite of anywise is n-wise or pairwise -- where the meaning imputed to the DID is specialized for an enumerated set of others.)

2. There is a tension between self-sovereignty and discoverability. One of the ways you might want to exercise your sovereignty is to make your own decisions about discoverability. If we do discoverability the simple way (e.g., approximating the listing of a DID in a phone book), you have no control over who discovers you. This is FINE for certain use cases. As Daniel B points out, I want the world to be able to discover my LinkedIn profile. But if it's our only discoverability story, I think we've limited our architecture. Joe A made some very astute comments about discoverability needing to be separated from the core of the DID problem domain a while back, and I remember agreeing with his conclusions. Put a bookmark in that for a minute.

3. I believe we should *publish* with anywise DIDs (e.g., emit a press release, issue credentials, say something on Twitter), *be discoverable* with them (I'll say more about that in a minute), and *listen indiscriminately* with them (like we do when we accept resumes at human-resources@acme.com<mailto:human-resources@acme.com>, or when we listen to others publishing at us on Twitter). However, I don't believe it's desirable to *specialize our interactions* via anywise DIDs; that is contrary to the intent of "anywise." Daniel B has argued that our public DIDs are how we'll interact, and this is where I diverge from him ever so slightly in my thinking. I think public DIDs are how we'll often *start* interacting (and keep interacting on Twitter), but not how we'll *keep* interacting.

Today, all of the following patterns are common: A) You open a socket for HTTP on port 80 but end up using a redirected socket on a custom port above 1024. B) You submit a resume to human-resources@acme.com<mailto:human-resources@acme.com>, and you get back a reply from Alice Jones, who's the Acme HR director running a particular job search. C) you connect to someone on LinkedIn, and then ask them for their contact information so you can carry on a direct conversation off the website/app. D) IT departments at enterprises strongly steer people to get unique TLS certs for different web servers in the org (hr.acme.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhr.acme.com%2F&data=02%7C01%7CDaniel.Buchner%40microsoft.com%7Caf14327ade28475a4b9b08d81d3fce2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291507172177182&sdata=K4IjgrP5YkqgyEtr%2FT4gAQKDaihlE2SLqg7bgdx7W9U%3D&reserved=0>, code.acme.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcode.acme.com%2F&data=02%7C01%7CDaniel.Buchner%40microsoft.com%7Caf14327ade28475a4b9b08d81d3fce2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291507172187179&sdata=0jA7TgBt6Kn1wi37mx9%2BVcHtgQ7VmCaLcuzJVLFFbjE%3D&reserved=0>, www.acme.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.acme.com%2F&data=02%7C01%7CDaniel.Buchner%40microsoft.com%7Caf14327ade28475a4b9b08d81d3fce2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291507172187179&sdata=nFL5hLgEd64uGYNX1yNje0HLuN9VFMGus%2BcaGrpLkbw%3D&reserved=0>), instead of installing the root certificate for acme.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Facme.com%2F&data=02%7C01%7CDaniel.Buchner%40microsoft.com%7Caf14327ade28475a4b9b08d81d3fce2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291507172197175&sdata=4KulPlF6tPvgp2rsJHTyXLv%2F55u15zNbz1qh0lLavmk%3D&reserved=0> on everything.

In future DIDlandia, I predict that similar patterns will emerge. Comparing just my final example, IT departments at BigCorp will be very averse to putting all institutional cybersecurity eggs in a single anywise+public DID basket; instead, they'll want specialized DIDs that quickly cease to be anywise, because that limits risk and distributes the admin duties for DID keys. The big, general anywise+public DID held by BigCorp will be just a gateway or starting point to more specialized DIDs that are sometimes anywise and sometimes pairwise, but that are not required to be (maybe not desired to be) public.

4. Because of the above, I believe the DID usage pattern that will come to dominate mainstream usage is: Create an anywise(+public?) DID for discoverability, broadcast, and Twitter (anywhere you remain in generic public mode) -- but as soon as you move from discovery and publication to direct bilateral or multilateral conversations, switch to a non-anywise DID. Notice that I said "non-anywise" rather than "non-public" or "private." It isn't the visibility that's the defining characteristic here, and I'm not claiming these must be peer DIDs; what I'm claiming is that we can give up a need for discoverability after we've been discovered. If you intend to use a DID only for Bob, and you and Bob have already discovered each other, then you don't need the world to discover the handles you use. Maybe the world *can* discover it, or maybe you prefer that the world not discover it -- but either way, you certainly don't *need* that feature. This means that you can simply give pairwise/n-wise DID values to the parties that need them. You may still use a ledger for resolution, or you may do something like did:peer or did:key to skip the ledger entirely. So I'm not saying something about how you communicate the DID doc. I'm just talking about the DID value.

5. If the first time you encounter a VC holder's DID is when they prove something to you, then you also don't need to discover their holder DID -- at least not directly and in advance. You just need to resolve it after you see it. In the education space, for example, where Kim and friends are exploring learner DIDs, these DIDs may or may not be public (visibility could vary) -- but they don't need to be discoverable, just resolvable.

Okay, so this brings me back to Adrian's question about privacy and service endpoints.

What if a thousand or a million DIDs shared the same endpoint?

This *could* mean that just learning the endpoint of Alice doesn't tell you anything particularly sovereignty- or privacy-destroying about her. But how does the endpoint route to Alice, out of all the millions of targets it supports?

The answer is what I said before, about starting anywise (discoverable) and switching to n-wise/pairwise (undiscoverable). Anybody in the world can discover your LinkedIn handle, but not just anybody can discover the private contact info behind that handle. We want the same in DIDlandia.

How this works in practice is:

  *   Alice has an anywise+public DID: A.did@Any<mailto:A.did@Any>. (This notation means Alice's DID at the "Any" relationship). She also has a pairwise DID for her relationship with Carol: A.did@A:C<mailto:A.did@A:C> (Alice's DID in the A-to-C relationship.)
  *   Both of these DIDs have the same endpoint. However, the world cannot discover A.did@A:C<mailto:A.did@A:C>. (This DID may or may not be highly private. It may or may not be ledger resolvable. Preferably the world can't discover that it exists at all; at a minimum, the world can't look it up in a phonebook that tells them it belongs to Alice.)
  *   There is a mediator that serves an endpoint for Alice and (hundreds, thousands, millions) of other people or orgs. Everyone in the herd probably has lots of DIDs.
  *   When a message arrives for Alice, it is encrypted for either of her DIDs, and it is *also* encrypted for the mediator. This means no party other than the mediator can decrypt its outer envelope, and only Alice can decrypt its inner envelope. So to the world, the only thing that's observable is that a message was transmitted, possibly from a known/observable source, to this shared endpoint. That's it. The destination is not observable.
  *   When the mediator receives the double-wrapped message, it decrypts the outer envelope. This lets it learn the DID of the intended recipient. It can then forward the uncrackable, encrypted inner envelope to either of Alice's DIDs. The mediator is thus slightly more trusted than the public; it can make an association between source and target DIDs. It doesn't have to know which target DIDs belong to Alice, though. (Protecting Alice from a malicious mediator is a deep subject I won't go into here, but there are moderately good ways...)
  *   If Alice is tweeting or needs her resume to be discoverable, she uses A.did@Any<mailto:A.did@Any>. She can publish this. If Alice is an org, that DID can go in the .well_known folder on a website, etc. So now suppose Alice meets Bob at a conference. She's placed A.did@Any<mailto:A.did@Any> on the last slide of her presentation, and Bob captures the QR code and reaches out to her. This "reaching out" means that Bob looks up the published endpoint for A.did@Any<mailto:A.did@Any> (resolution), encrypts a message, using either an anywise DID that he regularly uses, or a new, one-off DID that he allocates, and sends the message to that endpoint. He does a second encryption before he sends, so only Alice's mediator can decrypt the outer envelope.
  *   Alice's mediator relays the message to Alice's A.did@Any<mailto:A.did@Any>, probably serviced by a mobile app she is using, or maybe by some software running on a server (if Alice is an org).
  *   Alice creates a new pairwise DID, A.did@A:B<mailto:A.did@A:B>, and sends it back to Bob at the endpoint associated with the DID Bob used. This new DID probably uses the same endpoint as Alice's @Any DID, though it doesn't have to.
  *   Bob can now send messages to Alice at a DID that is used by a massive herd, and nobody will be able to tell he's talking to Alice. Best practice would be for Bob to also rotate his DID at this point, by sending back to Alice a message that says, "Hey, I contacted you before using DID X. That might have been observed. I'm going to switch to using B.did@A:B<mailto:B.did@A:B> now." Since Alice is the only party in the world who could decrypt such a message to see the new DID value, this breaks any possible association between Bob's original request and their ongoing conversation.
Now, stepping back from the details, what does this accomplish?

  *   Discoverability of DID values, DID endpoints, and possible metadata associated with DIDs is limited crisply to DIDs that are 100% public by intent.
  *   People and orgs can operate publicly, as Daniel has advocated. Nobody has to give up discoverability.
  *   Those same people and orgs can also operate privately or semi-privately. When they do, there is no simple/cheap/trivial way to connect the private part and the public part. (Yes, I know all about correlation engines. Suffice it to say that there's an arms race and we probably won't win it against a state-level actor, but we can create a reasonable firewall against casual correlation, and the strength of the firewall is commensurate with the degree of our investment. More about this here<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.evernym.com%2Fblog%2Fwell-be-correlated-anyway%2F&data=02%7C01%7CDaniel.Buchner%40microsoft.com%7Caf14327ade28475a4b9b08d81d3fce2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637291507172197175&sdata=lcdSpNm5W5jD4svYSJCCinNAzzF%2FsrPNEa1IWIHU9kM%3D&reserved=0>.)
  *   None of this behavior has to corrupt the simplicity of DID core. It can all be layered on top.

On Mon, Jun 29, 2020 at 8:28 AM Dave Longley <dlongley@digitalbazaar.com<mailto:dlongley@digitalbazaar.com>> wrote:

On 6/29/20 9:52 AM, Manu Sporny wrote:
> On 6/29/20 5:14 AM, Adrian Gropper wrote:
>> If there were only one service endpoint, what would it be and could it
>> accommodate authentication, authorization, storage, and notification
>> uses without undue limitation?
>
> I believe that this is where Digital Bazaar currently is... that service
> endpoints advertised in the DID Document are an anti-pattern... we can
> already see that developers without a background in privacy engineering
> are unwittingly abusing the field.
>
> In the simplest case, it creates large challenges wrt. GDPR and the
> organizations creating software for or running verifiable credential
> registries.
>
> In many other use cases, it invites abuse (direct link to your personal
> Identity Hub, web page, email, being some of them).
>
> The solution is probably to place a pointer in a DID Document that
> effectively states: "You can find more information about the DID Subject
> over here ---> X"... and then to point to somewhere that a caller can
> see public information in a way that is GDPR compliant (e.g., a list of
> Verifiable Credentials), or for more advanced use cases, where the
> caller can authenticate in order to get information that is intended for
> only a subset of individuals (again, protecting privacy by default).
>
> Would anyone object if we took service endpoints in this direction?
> Effectively, we'd replace them with a "seeAlso" or "moreInformation"
> link pointing to a list of Verifiable Credentials that would provide
> information relating to identity hubs, personal data stores, web pages,
> contact information, and other privacy-sensitive material.

I think it's also important to remember that if you want to "discover" a
service endpoint from a DID Document, you first needed to have
"discovered" the DID. How did that happen? In many cases, you had to ask
for it from the DID controller; in which case you often would have to
tools to also ask for this "seeAlso"/"moreInformation" service endpoint
-- and for authorization to read specific information from it as well.

I think that service endpoints that are directly advertised in DID
Documents only make sense for "public" or "social" DIDs. Even then,
particularly for DID methods that use DLTs that do not easily support
deletion, service endpoint information should be expressed elsewhere.
This points to a need for other decentralized registry services that
allow for both discovery and deletion. These services would not need to
be DID-method specific.


--
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Tuesday, 30 June 2020 23:00:20 UTC