Re: Comments on VCTF Report from Jeffrey Burdges on 2016-03-14 (public-credentials@w3.org from March 2016)

From: Jeffrey Burdges <jeffrey.burdges@inria.fr>
Date: Mon, 14 Mar 2016 22:42:33 +0100
To: wseltzer@w3.org, Ian Jacobs <ij@w3.org>, Manu Sporny <msporny@digitalbazaar.com>, Dave Longley <dlongley@digitalbazaar.com>, Shane McCarron <shane@halindrome.com>, Adrian Hope-Bailie <adrian@hopebailie.com>, wseltzer@w3.org, Daniel Kahn Gillmor <dkg@fifthhorseman.net>, Peter Eckersley <pde@eff.org>, Joseph Bonneau <jcb@eff.org>
Cc: Web Payments IG <public-webpayments-ig@w3.org>, Credentials Community Group <public-credentials@w3.org>, public-privacy@w3.org, kate@torproject.org, Mike Perry <mikeperry@torproject.org>, RogerDingledine <arma@mit.edu>, Christian Grothoff <grothoff@gnunet.org>, Bruno Haible <bruno@clisp.org>
Message-ID: <1457991753.18386.246.camel@inria.fr>
Hello,

There are some censorship considerations that appear inherent in
verifiable claims, which I shall briefly discuss towards the end of 
this mail.  These should not be ignored, but I shall focus primarily
on more technical privacy issues here.


I believe the Verifiable Claims Task Force should adopt the view that :

 Verifiable Claims should not leak information about private
 individuals beyond what individual unambiguously recognizes
 that the claim communicates.

As an example, a verifiable claim for "proof of age" should not leak
either the user's nationality, or their mother tongue.  These properties
have legal protections under some national laws, so revealing them could
create legal liabilities not merely for web site operators, but even
browser venders. 

There are two particularly problematic technical scenarios that arise 
from this consideration. 


1.  Unacceptable non-determinism in many signature algorithms

Verifiable Claims should not use cryptographic signature algorithms
that contain extra non-deterministic entropy, such as nonces or
padding, that creates a tracking identifier of the user.

For example, if a user proves their age to a website on two separate
occasions, then the signature operation itself should not be correlated
by the website, or a third party verifier.


1.1.  Specific problematic algorithms : 

Schnorr or ElGammel based signature algorithms employ a nonce k that
usually uniquely identifies that particular signature and thus that
user :
https://en.wikipedia.org/wiki/Schnorr_signature
https://en.wikipedia.org/wiki/ElGamal_signature_scheme

In particular, ECDSA should be prohibited : 
https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm#Signature_generation_algorithm

EdDSA is an example of a Schnorr based signature algorithm that does not
necessarily violate principle because this nonce is deterministically
constructed from the public key and the message : 
http://ed25519.cr.yp.to/papers.html
EdDSA would be okay so long as the signing authority employs the same
signing key every time they sign a verifiable claim.  It should probably
be excluded if this condition cannot be met. (1)

In principle, RSA itself should be okay as one simply exponentiates the
signed message :  
https://en.wikipedia.org/wiki/RSA_%28cryptosystem%29#Signing_messages
The strongest RSA signature algorithms like RSA-PSS add entropy
with their padding however.  We need the padding to be completely
deterministic, which means older padding protocols like PKCS of FDH. 

Macaroons have properties that might be interesting for a user wishing
to delegate access to a resource, but they're entirely based upon a
nonce that the user cannot transform.  


1.2.  Specific acceptable algorithms : 

Rerandomizable signatures appear to address this issue.
https://eprint.iacr.org/2015/525

Single-use RSA blind signatures work.  They present an engineering
challenge in that the signing authority must issue a cache of
single-use tokens to the user, but that also helps limit damage
from claims/credential theft.  A naive approach cannot sign complex
data however.

Group signature schemes in which the signing authority delegates the
ability to sign information to the user work fine, as the anonymize
members of the group.  Again a naive approach cannot sign complex
data though.

Zero-knowledge proofs would likely be a vector for doing this. 

None of these options are everyday vanilla cryptography.


2. Dangers inherent in certificate chains

Verifiable Claims should not employ chains of trust like X.509
without explicitly telling the user what they reveal to the verifier.  
In other words, the user agent should expose to the user the chain
of signing authorities who issue that signature, when their
certificates were signed, and clarify that this information gets
communicated.

For example, there was a discussion of users proving their age using
government issued credential but restricting it to only revealing the
year of their birth.  In this case, the credential still shows the
state or locality where they reside, so using it should make this
clear to the user. 

There are two natural alternatives to chains of trust : 

First, group signature schemes provide exactly a one hop alternative
with the appropriate privacy properties, but expanding that to multiple
hops sound like a research problem.  A major drawback is that these
employ pairing.

Second, zero-knowledge proofs might provide a solution here too.  Again
the known stuff handles one hop and expanding that sounds like a research
problem. 


3. Solutions

In summery, there are a number of interesting research problems around 
doing verifiable claims in a way that communicate nothing inappropriate
about the user's real identity.  At present, these "research problems"
sounds like an anathema to standardization though.

I'd therefore suggest the Verifiable Claims Task Force look into simply
backing a cryptographic competition to address the issue with privacy
preserving certificate chains.  I'd expect the nonce problem would not
be omitted from that sort of analysis.

Of course, there is always a "crypto lite" approach in which claims
adhere strictly to the browser's same origin policy and the standard
utilizes only existing allowable cross site interactions, like top-level
GET requests.  I think this does not resolve the above concerns quite
as succinctly as one might imagine.  

Also, there was previously a discussion about the correct venue for
doing verifiable clams.  I think the seriousness of these privacy 
concerns indicates that the optimal venue might be the privacy working
group :  https://www.w3.org/Privacy/


4.  Ethical considerations

We could potentially address the above technical privacy concerns;
however, that alone does not necessarily make verifiable claims a
good idea.  There are profound ethical considerations as well.  

In particular, we should ensure that new standards cannot be used to
deploy any sort of general purpose "internet passport" as this would
tend to enable censorship, including harm to interoperability and online
commerce.

Imagine if for example a site like say Youtube or Twitter were convinced
to use Verifiable Claims for age verification.  At present, these sites
are actively censored in China for both political reasons and economic
protectionism.  It seems clear the Chinese government could ensure
that any Chinese deployments of verifiable claims would fail to work
correctly with foreign media that represented either differing political
viewpoints, or competition for protected domestic providers.


Apologies for tacking this ethics bit on at the end, these are not 
academic considerations though, and the threat of censorship deserves 
to be a more prominent consideration than I have made it.  

As an example,  the U.S. State Department spends millions funding tools 
like CGIProxy and Tor Browser specifically to disrupt web censorship.  
We should avoid creating standards that ultimately make such efforts
more difficult.

Aside from discussions with the W3C privacy working group, I would
suggest the Verifiable Claims Task force reach out to human rights
organizations to obtain better comments on ethical considerations. 
Of course, the EFF and EPIC are obvious groups to contact.  Another is
the Tor Project, maybe kate@torproject.org for example.


Apologies for this mail growing so long.  
Best wishe, 
Jeff Burdges


p.s.  In addition to legal risks, I suspect verifiable claims create
longer-term "legislative risks" for browser venders :  

We should naturally expect verifiable claims to be regulated under the
E.U. Data Protection Directive, just like cookies, flash storage, etc.  
At present, these storage mechanisms are provided by the browser, but
the notification requirements fall upon websites.  

One could imagine those notification requirements being applied to
browser venders directly however.  This sounds less far fetched if the
browser venders have themselves already defined user-interface for some
privacy sensitive information, such as verifiable claims.  












-------- Forwarded Message --------
From: Ian Jacobs <ij@w3.org>
To: Manu Sporny <msporny@digitalbazaar.com>, Dave Longley
<dlongley@digitalbazaar.com>
Cc: Web Payments IG <public-webpayments-ig@w3.org>
Subject: Comments on VCTF Report
Date: Tue, 16 Feb 2016 20:59:32 -0600

Dear Members of the VCTF [0],

Thank you for preparing a report [1] on your activities for discussion
at the upcoming face-to-face meeting. I read the report and the
minutes of all the interviews. I have not read the use cases [2].

I have several observations and questions that I'd like to share
in advance of the face-to-face meeting. I look forward to the
discussion in San Francisco. I will continue to think about
topics like "questions for the FTF meeting" and "ideas for next
steps."

Ian

[0] http://w3c.github.io/vctf/
[1] https://lists.w3.org/Archives/Public/public-webpayments-ig/2016Feb/0029.html
[2] http://opencreds.org/specs/source/use-cases/

==================

* First, thank you for conducting the interviews. I appreciate the
time that went into them, and you managed to elicit comments from an
interesting group of people.

* In my view, the ideal outcome from the task force's interviews would
have been this: By focusing on a problem statement in conversations
with skeptics, areas of shared interest would emerge and suggest
promising avenues for standardization with buy-in from a larger
community than those who have been participating in the Credentials
Community Group.

* With that in mind, I think the results are mixed:

     - The interviews included valuable feedback that I believe can be
     useful to focusing discussion of next steps. For example,
     compiling a list of concerns about the project is very useful.

     - I believe the report does not do justice to this useful
     information.

* Here is why I believe the report does not do justice to the
  interviews: it includes information that I don't believe was part of
  the task force's work, which clouds what the report could most
  usefully communicate. Specifically:

    - The survey in 5.1 was not part of the task force's work [0].

    - While documenting use cases [2] is valuable, I did not read
      in the interviewer's comments that they had considered the
      use cases. It would have been interesting, for example, for
      the interviewees to have considered the use cases, and to
      determine whether there was a small number of them where
      there was clear consensus that it was important to address
      them. But without connecting the interview comments to the
      use cases, I believe they only cloud this report.

      Thus, I find confusing the assertion in 6.4 that
      a "point of consensus" is that there are use cases. That
      may be the consensus of the Credentials CG that produced
      them, but it is not clear to me from reading the minutes
      that there is consensus among the interviewees on the
      use cases. Similarly, section 3 (Summary of Research Findings)
      goes beyond the work of this task force to include the use
      cases.

* While there were a lot of valuable comments in the interviews, it would
  not be cost-effective to paste them all here. Here are a few synopses:

  - It sounded like people acknowledged the problem statement
    and also that this is a hard problem to solve.

  - Many people emphasized the opportunity to improve security and privacy.
    One opportunity that was mentioned had to do with user-friendly key
    management (which made me think of SCAI).

  - There is a high cost to setting up an ecosystem, and so the
    business incentives must be carefully considered and
    documented. (This is covered in 7.3 of the report.)

  - I found Brad Hill's comments particularly helpful:
    https://docs.google.com/document/d/1aFAPObWUKEiSvPVqh9w1e6_L3iH4T08FQbJIOOlCvzU/

  - A number of comments seemed to me to suggest a strategy for
    starting work:

    * Start small.
    * Start by addressing the requirements of one industry and build from
      there. I heard two suggestions for "Education" and explicit advice.
      against starting with health care or financial services.
    * Be pragmatic.
    * Reuse existing standards (a point you mention in section 3 of the report).


* I don't understand the role of section 4 ("Requirements Identified
  by Research Findings"). This is not listed as a deliverable of the
  task force [0] and it does not seem to me to be derived from the
  interviews. The bullets don't really say "Here is the problem
  that needs to be solved." I think the use cases comes closer, and
  we need more information about business stories as mentioned above.
  Talking about things like software agents helping people store
  claims feels like a different level of discussion.

* In section 6 "Areas of Consensus:

  - "Current technologies are not readily solving the problem."

    I don't think that's the consensus point. I think that formulation
    suggests too strongly "and thus new technologies are needed."

    I think the following headline phrase is more accurate: "Reuse
    widely deployed technology to the extent possible." You do say
    something close to that in the paragraph that follows, and
    again in 7.8.

  - "Minimum First Step is to Establish a Way to Express Verifiable
    Claims"

    (Also covered in a bullet in section 4.)

    First of all, I did not reach that result from reading the
    interviews.  Second, the very sentences in the paragraphs that
    follow suggest there is no consensus. Namely:

    * "Many of the interviewers suggested that having a data model and
      syntax for the expression of verifiable claims AS ONLY PART OF
      THE SOLUTION." (This suggests they may not agree that "expression"
      is a minimal first step and that MORE is required in a first step.)

    * "Some of the interviewers asserted that the technology already
      exists to do this and that W3C should focus on vocabulary
      development." (So this is a recommendation to do vocabulary work.)

    * "Others asserted that vocabulary development is already
      happening in focused communities (such as the Badge Alliance,
      the Credentials Transparency Initiative)." (This doesn't say
      anything about what W3C should do; perhaps this sentence could
      be attached to the previous one instead.)

    * "Many of the interviewers suggested that the desirable outcome
      of standardization work is not only a data model and syntax for
      the expression of verifiable claims, but a protocol for the
      issuing, storage, and retrieval of those claims, but
      acknowledged that it may be difficult to convince W3C member
      companies to undertake all of that work in a single Working
      Group charter. " (This sounds like a repeat of the first bullet.)

    * "In the end, consensus around the question what kind of W3C
      charter would garner the most support seemed to settle on the
      creation of a data model and one or more expression syntaxes for
      verifiable claims."

    Basically, I do not think there is a consensus to do that among
    the interviewees. In detail, here’s what I read:

        - Brad Hill: "I don't know"
 - Christopher Allen: (I don't see any comment)
 - Drummond Reed: "user-side control of key management"
        - John Tibbetts: "document what a credential looks like
                         (perhaps either a data model or ontology)
                         plus a graphical diagram"
        - Bob Sheets: "I have a hard time addressing that question,
                       whatever it takes to get your group started and
                       on the map and doing work the better."
        - David Chadwick: (I don't see any comment)
        - Mike Schwartz:  (I don't see any comment)
        - Dick Hardt:  (I don't see any comment)
 - Jeff Hodges: (I don't see any comment)
        - Harry Halpin: "Another option is to scope down and aim at a
                        particular problem domain, for example a
                        uniform vocabulary for educational
                        credentials. "
        - David Singer: (I don't see any comment)

* I found interesting the section on "areas of concern" (along with
  Brad Hill's comments). It might be possible to categorize the
  concerns like this:

  a) Social issues
     7.2 scalability of trust
     7.3 business models and economics
     7.4 business model for infrastructure
     7.7 liability; fraud and abuse

  b) Design issues
     7.5 slow evolution of agent-centric designs
     7.6 risks associated with identifiers, keys, revocation
     7.7 reusing existing work

  c) Communication
     7.1 communicate vision / big picture
       (BTW, I agree, but this does not imply it belongs in a charter).

  - Scalability of trust is very interesting. I think I agree it's
    good to have an architecture that supports diverse business
    models, trust models, etc.

  - On business models and economics: "it is yet unknown if
    kickstarting the market will be enough to build a strong economic
    incentive feedback loop." It might be easier to find an answer
    by adopting the above strategy points about starting small and
    picking one market.

* Please list the editors of the report. Also, if possible, please list in an
  acknowledgments section of the report the participants in the task force.

--
Ian Jacobs <ij@w3.org>      http://www.w3.org/People/Jacobs
Tel:                       +1 718 260 9447
Received on Monday, 14 March 2016 21:41:46 UTC