Re: The DID service endpoint privacy challenge from Daniel Hardman on 2020-06-30 (public-did-wg@w3.org from June 2020)

From: Daniel Hardman <daniel.hardman@evernym.com>
Date: Tue, 30 Jun 2020 11:19:14 -0600
Cc: public-did-wg@w3.org
Message-ID: <CAFBYrUoZzr2Q5UR+qZu2TvGfJQTQZsaJv6Pf2E1FjugU4pDKVA@mail.gmail.com>
TL;DR I think there's a solid, satisfying answer to Adrian's question, and
it involves savvy application of herd privacy. It may require some subtle
shifts in expectations, but it does NOT require me to disagree with Daniel
B about public DIDs, and it doesn't require everybody to become a
privacy extremist, and it doesn't require commitments to any particular
ledger or VC tech; it just requires some careful nuance. Hopefully that's
intriguing enough that you'll read on. :-)

In what follows, I know I'm mixing in some institutional perspective with
individual perspective, even though Adrian's privacy question is more
individually focused. Hang with me; they're related.

1. I think the phrase "public DID" and its supposed opposite, "private
DID," are entrapping us in a false dichotomy. It helps me to distinguish
between "public" and "anywise". "Public" is a statement about intended
visibility (you want something known and discoverable as broadly as
possible; "private" is also a statement about visibility and means roughly *but
not exactly* the opposite). "Anywise," on the other hand, is a statement
about the intended relationship (you intend to treat any party who
interacts with you via an anywise DID the same way; it is not interested in
who the other party might be). Anywise and public often coincide, but not
always. If you worked for a company with 5,000 employees, and published a
DID in the company directory, the DID would not be public because it has a
restricted audience. Yet it would be anywise because you intend for that
DID to be used the same way by anyone who discovers it (to kickstart a
relationship). My point is that you can have anywise+public (what we've
mostly thought of before), but also anywise+private, or something in
between, like anywise+not-discoverable-but-not-super-private-either. (BTW,
the opposite of anywise is n-wise or pairwise -- where the meaning imputed
to the DID is specialized for an enumerated set of others.)

2. There is a tension between self-sovereignty and discoverability. One of
the ways you might want to exercise your sovereignty is to make your own
decisions about discoverability. If we do discoverability the simple way
(e.g., approximating the listing of a DID in a phone book), you have no
control over who discovers you. This is FINE for certain use cases. As
Daniel B points out, I want the world to be able to discover my LinkedIn
profile. But if it's our only discoverability story, I think we've limited
our architecture. Joe A made some very astute comments about
discoverability needing to be separated from the core of the DID problem
domain a while back, and I remember agreeing with his conclusions. Put a
bookmark in that for a minute.

3. I believe we should *publish* with anywise DIDs (e.g., emit a press
release, issue credentials, say something on Twitter), *be discoverable*
with them (I'll say more about that in a minute), and *listen
indiscriminately* with them (like we do when we accept resumes at
human-resources@acme.com, or when we listen to others publishing at us on
Twitter). However, I don't believe it's desirable to *specialize our
interactions* via anywise DIDs; that is contrary to the intent of
"anywise." Daniel B has argued that our public DIDs are how we'll interact,
and this is where I diverge from him ever so slightly in my thinking. I
think public DIDs are how we'll often *start* interacting (and keep
interacting on Twitter), but not how we'll *keep* interacting.

Today, all of the following patterns are common: A) You open a socket for
HTTP on port 80 but end up using a redirected socket on a custom port above
1024. B) You submit a resume to human-resources@acme.com, and you get back
a reply from Alice Jones, who's the Acme HR director running a particular
job search. C) you connect to someone on LinkedIn, and then ask them for
their contact information so you can carry on a direct conversation off the
website/app. D) IT departments at enterprises strongly steer people to get
unique TLS certs for different web servers in the org (hr.acme.com,
code.acme.com, www.acme.com), instead of installing the root certificate
for acme.com on everything.

In future DIDlandia, I predict that similar patterns will emerge. Comparing
just my final example, IT departments at BigCorp will be very averse to
putting all institutional cybersecurity eggs in a single anywise+public DID
basket; instead, they'll want specialized DIDs that quickly cease to be
anywise, because that limits risk and distributes the admin duties for DID
keys. The big, general anywise+public DID held by BigCorp will be just a
gateway or starting point to more specialized DIDs that are sometimes
anywise and sometimes pairwise, but that are not required to be (maybe not
desired to be) public.

4. Because of the above, I believe the DID usage pattern that will come to
dominate mainstream usage is: Create an anywise(+public?) DID for
discoverability, broadcast, and Twitter (anywhere you *remain* in
generic public mode) -- but as soon as you move from discovery and
publication to direct bilateral or multilateral conversations, switch to a
non-anywise DID. Notice that I said "non-anywise" rather than "non-public"
or "private." It isn't the visibility that's the defining characteristic
here, and I'm not claiming these must be peer DIDs; what I'm claiming is
that *we can give up a need for discoverability after we've been discovered*.
If you intend to use a DID only for Bob, and you and Bob have already
discovered each other, then you don't need the world to discover the
handles you use. Maybe the world *can* discover it, or maybe you prefer
that the world not discover it -- but either way, you certainly don't
*need* that feature. This means that you can simply give pairwise/n-wise
DID values to the parties that need them. You may still use a ledger for
resolution, or you may do something like did:peer or did:key to skip the
ledger entirely. So I'm not saying something about how you communicate the *DID
doc*. I'm just talking about the *DID value*.

5. If the first time you encounter a VC holder's DID is when they prove
something to you, then you also don't need to discover their holder DID --
at least not directly and in advance. You just need to resolve it after you
see it. In the education space, for example, where Kim and friends are
exploring learner DIDs, these DIDs may or may not be public (visibility
could vary) -- but they don't need to be discoverable, just resolvable.

Okay, so this brings me back to Adrian's question about privacy and service
endpoints.

What if a thousand or a million DIDs shared the same endpoint?

This *could* mean that just learning the endpoint of Alice doesn't tell you
anything particularly sovereignty- or privacy-destroying about her. But how
does the endpoint route to Alice, out of all the millions of targets it
supports?

The answer is what I said before, about starting anywise (discoverable) and
switching to n-wise/pairwise (undiscoverable). Anybody in the world can
discover your LinkedIn handle, but not just anybody can discover the
private contact info behind that handle. We want the same in DIDlandia.

How this works in practice is:

   - Alice has an anywise+public DID: A.did@Any. (This notation means
   Alice's DID at the "Any" relationship). She also has a pairwise DID for her
   relationship with Carol: A.did@A:C (Alice's DID in the A-to-C
   relationship.)
   - Both of these DIDs have the same endpoint. However, the world cannot
   discover A.did@A:C. (This DID may or may not be highly private. It may
   or may not be ledger resolvable. Preferably the world can't discover that
   it exists at all; at a minimum, the world can't look it up in a phonebook
   that tells them it belongs to Alice.)
   - There is a mediator that serves an endpoint for Alice and (hundreds,
   thousands, millions) of other people or orgs. Everyone in the herd probably
   has lots of DIDs.
   - When a message arrives for Alice, it is encrypted for either of her
   DIDs, and it is *also* encrypted for the mediator. This means no party
   other than the mediator can decrypt its outer envelope, and only Alice can
   decrypt its inner envelope. So to the world, the only thing that's
   observable is that a message was transmitted, possibly from a
   known/observable source, to this shared endpoint. That's it. The
   destination is not observable.
   - When the mediator receives the double-wrapped message, it decrypts the
   outer envelope. This lets it learn the DID of the intended recipient. It
   can then forward the uncrackable, encrypted inner envelope to either of
   Alice's DIDs. The mediator is thus slightly more trusted than the public;
   it can make an association between source and target DIDs. It doesn't have
   to know which target DIDs belong to Alice, though. (Protecting Alice from a
   malicious mediator is a deep subject I won't go into here, but there are
   moderately good ways...)
   - If Alice is tweeting or needs her resume to be discoverable, she uses
   A.did@Any. She can publish this. If Alice is an org, that DID can go in
   the .well_known folder on a website, etc. So now suppose Alice meets Bob at
   a conference. She's placed A.did@Any on the last slide of her
   presentation, and Bob captures the QR code and reaches out to her. This
   "reaching out" means that Bob looks up the published endpoint for A.did@Any
   (resolution), encrypts a message, using either an anywise DID that he
   regularly uses, or a new, one-off DID that he allocates, and sends the
   message to that endpoint. He does a second encryption before he sends, so
   only Alice's mediator can decrypt the outer envelope.
   - Alice's mediator relays the message to Alice's A.did@Any, probably
   serviced by a mobile app she is using, or maybe by some software running on
   a server (if Alice is an org).
   - Alice creates a new pairwise DID, A.did@A:B, and sends it back to Bob
   at the endpoint associated with the DID Bob used. This new DID probably
   uses the same endpoint as Alice's @Any DID, though it doesn't have to.
   - Bob can now send messages to Alice at a DID that is used by a massive
   herd, and nobody will be able to tell he's talking to Alice. Best practice
   would be for Bob to also rotate his DID at this point, by sending back to
   Alice a message that says, "Hey, I contacted you before using DID X. That
   might have been observed. I'm going to switch to using B.did@A:B now."
   Since Alice is the only party in the world who could decrypt such a message
   to see the new DID value, this breaks any possible association between
   Bob's original request and their ongoing conversation.

Now, stepping back from the details, what does this accomplish?

   - Discoverability of DID values, DID endpoints, and possible metadata
   associated with DIDs is limited crisply to DIDs that are 100% public by
   intent.
   - People and orgs can operate publicly, as Daniel has advocated. Nobody
   has to give up discoverability.
   - Those same people and orgs can also operate privately or
   semi-privately. When they do, there is no simple/cheap/trivial way to
   connect the private part and the public part. (Yes, I know all about
   correlation engines. Suffice it to say that there's an arms race and we
   probably won't win it against a state-level actor, but we can create a
   reasonable firewall against casual correlation, and the strength of the
   firewall is commensurate with the degree of our investment. More about this
   here <https://www.evernym.com/blog/well-be-correlated-anyway/>.)
   - None of this behavior has to corrupt the simplicity of DID core. It
   can all be layered on top.


On Mon, Jun 29, 2020 at 8:28 AM Dave Longley <dlongley@digitalbazaar.com>
wrote:

>
> On 6/29/20 9:52 AM, Manu Sporny wrote:
> > On 6/29/20 5:14 AM, Adrian Gropper wrote:
> >> If there were only one service endpoint, what would it be and could it
> >> accommodate authentication, authorization, storage, and notification
> >> uses without undue limitation?
> >
> > I believe that this is where Digital Bazaar currently is... that service
> > endpoints advertised in the DID Document are an anti-pattern... we can
> > already see that developers without a background in privacy engineering
> > are unwittingly abusing the field.
> >
> > In the simplest case, it creates large challenges wrt. GDPR and the
> > organizations creating software for or running verifiable credential
> > registries.
> >
> > In many other use cases, it invites abuse (direct link to your personal
> > Identity Hub, web page, email, being some of them).
> >
> > The solution is probably to place a pointer in a DID Document that
> > effectively states: "You can find more information about the DID Subject
> > over here ---> X"... and then to point to somewhere that a caller can
> > see public information in a way that is GDPR compliant (e.g., a list of
> > Verifiable Credentials), or for more advanced use cases, where the
> > caller can authenticate in order to get information that is intended for
> > only a subset of individuals (again, protecting privacy by default).
> >
> > Would anyone object if we took service endpoints in this direction?
> > Effectively, we'd replace them with a "seeAlso" or "moreInformation"
> > link pointing to a list of Verifiable Credentials that would provide
> > information relating to identity hubs, personal data stores, web pages,
> > contact information, and other privacy-sensitive material.
>
> I think it's also important to remember that if you want to "discover" a
> service endpoint from a DID Document, you first needed to have
> "discovered" the DID. How did that happen? In many cases, you had to ask
> for it from the DID controller; in which case you often would have to
> tools to also ask for this "seeAlso"/"moreInformation" service endpoint
> -- and for authorization to read specific information from it as well.
>
> I think that service endpoints that are directly advertised in DID
> Documents only make sense for "public" or "social" DIDs. Even then,
> particularly for DID methods that use DLTs that do not easily support
> deletion, service endpoint information should be expressed elsewhere.
> This points to a need for other decentralized registry services that
> allow for both discovery and deletion. These services would not need to
> be DID-method specific.
>
>
> --
> Dave Longley
> CTO
> Digital Bazaar, Inc.
>
>
Received on Tuesday, 30 June 2020 17:19:41 UTC