Re: Verifiable Claims privacy/technology issues from Dave Longley on 2017-07-27 (public-vc-wg@w3.org from July 2017)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Thu, 27 Jul 2017 17:47:08 -0400
To: Tristan Hoy <tristan.hoy@gmail.com>, public-vc-wg@w3.org
Message-ID: <6fa6c0ab-1087-f8c3-635c-9ab576ea72c6@digitalbazaar.com>
On 07/27/2017 02:48 PM, Tristan Hoy wrote:
> The current draft architecture for Verifiable Claims describes a
> single point of privacy failure: the identifier registry.

Perhaps "registry" is a poor name. Perhaps failing to pluralize it is
also an issue.

> 
> Here is a set of cascading requirements, assuming the registry is
> some kind of "server": 1) The registry MUST be resilient to denial of
> service attacks 2) The registry MUST be able to discriminate between
> high-volume inspectors (e.g. Walmart, government agencies) and DDoS
> attackers 3) Therefore, the registry MUST authenticate inspectors
> 
> And this gives the registry access to all of the metadata concerning
> who an identity holder is interacting with. While this is perfect in
> a government or corporate environment where every interaction will be
>  logged regardless, it is not good for privacy.

There does not have to be a single identifier registry. There could be
government or corporate environments where using a server as an
identifier register is appropriate. For other environments, it would not
be, as you've indicated.

> 
> Unless of course, the registry is a blockchain, and each inspector is
>  running their own node. There is some very specific jargon that 
> indicates that this may be the not-so-opaque intention of the
> architecture: "The registry MUST manage identifiers in a
> self-sovereign way"
> 
> However this is speculation.

Using a blockchain as a registry is one possibility, yes.

> 
> What isn't speculation is that this architecture cannot possibly
> support interaction privacy unless the identity registry is a
> decentralized name service or some other flavour of blockchain.

This is not true, but I can see why you'd think so given the name
"registry". Identifiers in the architecture are simply URIs. Having an
identifier registry in the way you've conceptualized isn't actually a
hard requirement, it is just an enabler for certain use cases.

For example, identifiers could be based on public keys. Here there is no
actual "registry" but there are rules or a "namespace" for identifiers.
Proof of possession (when sharing verifiable claims) could depend on a
digital signature from the private key holder. This has a number of
disadvantages (potentially no key rotation) for long term verifiable
claims, but it would work perfectly well for a number of other use
cases. Consider the case, for example, where issuers are
highly-available and are able to dynamically sign pairwise identifiers
from relying parties (where proof of possession is performed via digital
signature).

It's hard to see that use case with the "registry" terminology though.
We should figure out a way to address this in the architecture
documentation. We welcome any specific text that you think would be helpful.

> 
> The working group charter states: "The Working Group will
> not...attempt to lead the creation of a specific style of supporting
> infrastructure"
> 
> But that's exactly what's happening: the registry is a required 
> component, and if you want interaction privacy, the registry has to
> be a blockchain.
> 
> And this comes with potentially fatal drawbacks: decentralized name 
> services critically lack the ability to block or revoke fake, hacked,
>  spam and lost identities in the same way that SSL/DNSSEC/Estonia 
> e-residency do. And because the blockchain is public, any
> implementation flaw (which is probable considering the high
> complexity) will permanently break privacy for all users.

For general purpose use it is true that a blockchain would need to be
public. There may be some communities that use a blockchain that isn't
public. That being said, the architecture makes no assumptions about how
proof of control, revocation, etc. are implemented by the identifier
registry. There is a DID specification related to this space here:

https://opencreds.github.io/did-spec/

I think the statement "any implementation flaw will permanently break
privacy for all users" seems hyperbolic. Can you provide some more
concrete examples where you're concerned?

> 
> If the architecture was agnostic about the issuance and verification
> of authenticating subject identifiers, then you could have privacy
> without a blockchain.

Well, the architecture is agnostic, but we've failed to communicate
this. We should find a way to make it more clear that the architecture
supports identifier registries but that they aren't required -- or
rename "registry" to "namespace" or similar.

> 
> The FAQ states in the answer to Q7: "The proposed data model and
> syntaxes are designed to be storage system and transaction protocol
> agnostic"
> 
> But it's not transaction protocol agnostic. The use of a registry 
> implies a specific transaction protocol that is either technology 
> specific or privacy violating.
> 
> Buried in the details, the data model recommends the use of
> short-lived or single-use bearer tokens (e.g. a public-key signed
> JWT) for high-privacy applications. These bearer tokens would not
> require a central registry, although this is not stated.
> 
> Another alternative is to simply use per-claim public/private
> keypairs, which are self-sovereign, self-authenticating and stateless
> (no central store required). Upon presenting a claim, the claim
> holder can sign a challenge issued by the claim inspector to verify
> ownership (rather than just possession) of the claim.

Yes, this is what I was alluding to in my response above. So it seems
that most of the problem here is the document's failure to communicate
the optional nature of an identifier registry -- or alternatively, the
failure to communicate that an "identifier register" is an abstract
concept that could include "rules for generating identifiers". It's
poorly named for that purpose.

> 
> But why isn't a high privacy option - e.g. bearer tokens, public keys
> - the default configuration?

Because the requirements, such as highly available infrastructure, are
too demanding. It doesn't support a number of use cases, such as long
term credentials that were issued by educational institutions that no
longer exist. The current architecture has a broader approach that
supports more use cases with less complexity -- and it allows for high
privacy options that have stronger infrastructure demands to be layered
on top.

> 
> Why does the front-and-centre diagram include an identity registry,
> that is either technology specific or privacy violating?
> 
> Why does it state, nowhere, that the registry is optional?

The "identifier registry" is considered a "namespace". So it isn't
necessarily "optional" in that sense. I agree that we should make a
better effort to explain the concept or make the concept more concretely
a "registry" and indicate it is optional. So, the answer is: "We're just
not communicating effectively on that point, so thank you for the
feedback and we'll try and address it." If you have specific text that
would be helpful, please send it to the list or use github.

> 
> Why does it seem like the spec places the needs of specific
> stakeholder groups above the absolute need for privacy?

"Above" meaning what specifically? The positioning of the text in the
spec? Are you asking this based on an explicit statement from the spec
or implications that you've derived from your reading of it or the
architecture design?

Different stakeholders reading the spec may have different opinions on
this matter.

The goal of the spec was to capture the various stakeholders in light of
use cases, not to put them into a particular order. Also, not every use
case involves privacy -- many of the use cases involve modeling
credentials like those placed publicly on websites or Linked In. The
space is much larger than, for example, the highly pseudonymous transfer
of credentials that assert atomic claims like "X is 18+ years old".

> 
> Recommendations for the working group: 1) The "brochure" version of
> the spec is the most important - and should place zero-registry,
> high-privacy options first and foremost to encourage privacy-first
> adoption

There are nuanced opinions on this matter in the group. I'll let others
speak for themselves.

> 2) The high-level architecture draft and interaction diagrams use 
> singular language when referring to identifiers, indicating that a 
> claims holder has only one identifier - this should be pluralized and
>  indicate multiple identifiers by default to encourage privacy-first
> adoption

I believe the group agrees we should do this and we've been trying to do
so in other specifications moving forward. We also have a lot of
additional privacy and anti-correlation work to do in the data model spec:

https://w3c.github.io/vc-data-model

> 3) Fix the link on your proposal to point to the current home of the
>  data model

I'll talk to someone about making that happen.

Thanks for your feedback! I hope my response has been helpful.


-- 
Dave Longley
CTO
Digital Bazaar, Inc.
http://digitalbazaar.com
Received on Thursday, 27 July 2017 21:47:32 UTC