User Control was: SOP - was: Agenda: <keygen> being destroyed when we need it from Henry Story on 2015-09-13 (www-tag@w3.org from September 2015)

From: Henry Story <henry.story@co-operating.systems>
Date: Sun, 13 Sep 2015 12:35:23 +0100
To: Alex Russell <slightlyoff@google.com>
Cc: Wendy Selzer <wseltzer@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>, Carvalho Melvin <melvincarvalho@gmail.com>, Tim Berners-Lee <timbl@w3.org>
Message-Id: <036637FE-030B-4D7A-8303-371CEF8CD284@co-operating.systems>
> On 12 Sep 2015, at 18:54, Alex Russell <slightlyoff@google.com> wrote:
> 
> On 12 Sep 2015 3:19 am, "Henry Story"  wrote:
> >
> > Thanks Alex for your very good summary.  I don't want to go over the
> > good points made by Melvin and Anders, but to hightlight this part
> > of your mail:
> >
> >> On 12 Sep 2015, at 02:20, Alex Russell <slightlyoff@google.com> wrote:
> >> [snip]
> >> Developers who want to persistent keys to the local system should
> >> acknowledge that this is an operation that lives outside the
> >> Same Origin Model. The  inability to scope the use of keys added
> >> via <keygen> (via addition to the effective keychain) creates a major
> >> hole in our one workable security primitive. It's true that this
> >> isn't part of the <keygen> spec, but compatibility requirements
> >> have caused this  to be true in practice. From an architectural
> >> perspective, this alone should be enough to cause the TAG to
> >> recommend removal of <keygen> and replacement with a better,
> >> origin-scoped alternative.
> >> [snip]
> >
> > I think this is the core of the discussion. It is this blind application
> > of SOP here that I and others wish to question.
> 
> I'm glad we're discussing this.
> 
> You might know that the TAG recently wrote a Finding on
> Unsanctioned Tracking: 
> http://www.w3.org/2001/tag/doc/unsanctioned-tracking/

Thanks for the pointer to this finding. I wish I had referred to it before.
It may be a place to summarise the findings from this discussion too.

> It's relevant here because storage and identification mechanisms that
> let sites persist data outside of effective user control -- and in
> particular those that work across origins -- are ripe for use as
> "supercookies".
> 
> Consistent application of the SOP is what enables positive user control.
> User action at key installation time can similarly be thought of as
> consent. The status quo is suspect on both points.

I note that the point of this paper is the distinction between sanctioned 
and unsactioned tracking. So the important legal and ethical concept is 
not the  technical one of "Same Origin" but the notion  of 
sanctioned/unsanctioned tracking: that is tracking sanctioned by the 
user who is meant to be in control. This is clear from the title of part 2 

      "Unsanctioned Tracking: Tracking without User Control"

The Same Origin Policiy (SOP) is a technical means of avoiding information 
leakage  further than the parties involved: the browser, the user agent
and the web agent. But it is not the same as user control as the following
to examples should make clear:

SOP without user Control:

• the EU laws on cookie setting mentioned by the paper would not 
make that much sense if SOP were identical to User Control. 
The user does not usually know that cookies are being set, and it
is usually not that easy to unset them. ( Note, that I am not 
defending those laws)

• with CORS ( http://www.w3.org/TR/cors/ ) - another core application
 of SOP - is not about user control, but about server control. Here the 
server is in control as to what information can be shared with user agents
from given origins, when running in a web page by setting some headers 
on the published content.

User Control without SOP:

• In the case of client certificates, which we are discussing, the user
is in control of the certificate to select (if any) when visiting a web 
site asking for authentication. A certificate selection box appears in 
the browser asking the user which one to select. 
( We collected a  small  sample of these, and put them up https://www.w3.org/wiki/Foaf%2Bssl/Clients/CertSelection ). 
  The same is true of accepting certificates for installation in the keychain.
  This is not to say that current browsers could not improve the UI putting
the user in control, but they clearly have been applying this principle
of user control when dealing with certificates that are then applied cross
origin.
  So this is a case of a non SOP feature enabling user control. 
• Hyperlinks in a web page giving the user control of which links to
  follow, even links on the same origin.

So given that one has a keychain which never signs anything without 
first asking the user, making it evident what this will be used for,
then one cannot speak of "super cookie", which the unsanctioned tracking
finding referred to above defines as

> So-called SuperCookies use implementation bugs, browser fingerprinting 
> and other techniques to continue to identify you and correlate your 
> activity even after you clear your cookies

If one switches Persona in Google Chrome to anonymous mode, that browser window 
no longer sends out the certificate, and should open a new TLS connection if
needed. How would having used a client side certificate to log on, be any different
than having used OpenID or OAuth, or just a username and password, or even Basic
Auth? 


> > It is quite clear to me that if this principle were not thought to be
> > untouchable then people would long ago have found an answer to all the
> > other problems you and others have mentioned. The Browser vendor Engineers   
> > would have
> >  - found a way to improve or replace spkac,
> >  - a debate about how to extend the <keygen> tag so that it could enable
> > a better UI, and many other features required would have lead to fruitful
> > results
> >  - taken the opportunity to work with the IETF on better certificate formats
> > such as JOSE or supported work on non syntactic bound certificate formats by
> > reading up on research done on the semantic web side of things
> >  - people would have even found ways of taking a leaf from FIDO and find a
> > language to limit certificate usage to certain range of applications
> >  - there would have been enthusiastic support for improving the user interface
> > of browsers to integrate WebID and make the experience extreemly user friendly
> > and social network aware
> >  - ....
> >
> > But of course if you believe that a certificate should only be used for
> > the origin from which you got it, then it makes no sense to continue with
> > <keygen> which allows you to generate a certificate (X509 at present) whose
> > whole purpose is to safely allow you to use it cross origin ( which is why
> > it is used for server authentication in TLS ).
> >    So client-certificates-usable-across-origin is really what people think
> > SOP argues against. And so from that perspective the anti SOP, anti-linkeability
> > commitments of FIDO [1] makes perfect sense.
> >
> > But SOP is not a foundational principle of the web, which is primarily about
> > linkeability historically and conceptually. SOP is essentially a JavaScript (JS)
> > limitation introduced because JS introduced agentood into a declarative web.
> > In addition to the agenthood of the User Agent and the User, JS introduced the
> > agenthood of JS fetched from the web. This follows from JS being a
> > procedural/functional language that can act in the browser environment
> > by clicking links, downloading information, POSTing forms, ... SOP is one 
> > way of identifying JS agency, and then limiting it.
> 
> This is less than half the story. All manner of buggery is possible via purely declarative forms. The browser enforces SOP in those cases as well. It may have seemed easier to reason about interactions between actors with only declarative systems in play, but we continue to be astonished at some of the things forms + images + CSS + iframes can do.
> 
> JS made it clearer, faster, but browsers would be still separating actors with SOP regardless.
> 
> But that's all indulgent thinking. JavaScript is a core part of the web stack today. We live in a world where it exists. We cannot pretend it doesn't.

Of course. I program in JS ( well actually my Scala code compiles to JS,
http://www.scala-js.org/ ) and am a heavy  user of many of its advanced 
features. 

What I am trying to do is show that SOP is not blindly applicable to 
this problem. The underlying applicability criteria for using SOP 
have to be made clear.
SOP is a way of identifying the parties in the communication, which
consists of
 
  - Web Server(s) ie, origins
  - User Agent
  - User
  - JS Agents coming from different Web Servers

SOP is a way of limiting the leakage of information coming from 
web servers, more than it is about putting the user in control. Rather 
the user ends up being in control only of what origins she communicates 
with, and so of information leakage knowing that the browser will 
limit information leakage between those origins.  ( as far as can go, 
since one can hide identifying information in links ).

On a particular connection to a particular origin the user is in control
of a number of things:
 - what links he clicks and when
 - how to authenticate if at all, be it
     + by selecting a particular username/password 
     + by selecting a particular OpenID 
     + by selecting a particular client certificate
 
> 
> > Note: It is not a particularly good way of identifying JS agency, as it is way too broad. Signed JS attributing it to an author or organisation would be a lot better. ( requiring therefore a certificate ).
> 
> That misreads the JS capability model in a very concerning way. Signing might let you know who sent you the code, but once you execute code from multiple actors inside the same heap, and in such a dynamic language, they have unfettered acccess to any capability the container will grant any participant. This is why SOP is important at runtime: unlike signing, it gives us a workable actor partition (via Workers and iframes) that maps cleanly onto the web's process & proviledge model.

We need more than just where the code came from. If we want to avoid pointing 
to code on other servers, and so leaking information about which applications 
we use, we need to be able to copy code to our servers without risking the 
code written by the least ethical company undermining that of the most ethical 
one. Currently there is no way to distinguish code written by two different agents
that I place on my server, other than creating one domain name per code downloaded,
which is a bit of a blunt tool for making such distinctions.  As a result a lot of
web sites such as github have removed all ability to use JS at all.

There has also been research about how to have code from different origins interact
intelligently in a much more subtle ways, without loosing security features as
described by the "COWL: A Confinement System for the Web" international research
project which Google and Mozilla were involved ( see http://cowl.ws/ )

> 
> > SOP applies to cookies. But the user is never asked by the chrome whether
> > or not a cookie should be set. Cookies are SOP by design.
> >
> > So Why is SOP important for certificate usage? It is clear that any
> > privacy enabled browser should
> >
> > a. on being asked for a certificate by a web site first ask the user
> > which certificate to choose
> 
> At TLS connection time? Modern site make tens and sometimes hundreds of distinct connections per page.

No of course not. That would be horrible and not leave the user 
in control. Or rather the user would have no way of knowing what 
site he was connecting to before authenticating. That is what TLS 
renegotiation is for. 

If we limit ourselves to TLS 2 and below first, this is a common 
misunderstanding of the capabilites of TLS, and also explains 
why client certificate authentication has not been as widely used 
as it  could have.

Good site design requires that: The server should first present itself 
to the user in a user friendly manner ( public front page ) served on a 
secure TLS connection, and then require authentication, but only for 
resources that are not public.

( This is different if the user is a robot, as the robot may be able to 
make decisions about the site directly from the TLS server certificate
in a way that a human being is unlikely to be able to. )

Still TLS renegotiations as pointed out previously has issues
for HTTP/2 ( aka SPDY ) which are being looked at by the HTTP Working 
group  in combination with improvements coming from TLS3.0 . See the 
thread "Client certificates in HTTP/2"

 • starting: 
   https://lists.w3.org/Archives/Public/ietf-http-wg/2015AprJun/0558.html
 • most recent: 
   https://lists.w3.org/Archives/Public/ietf-http-wg/2015JulSep/0310.html

Being able to move public key cryptography authentication to the HTTP
layer using key material generated from keygen stored in the keychain
would be a major improvement over the current situation making it much
easier for any server to deploy client certificate authentication. This
seems eminently feasible, and Tim Beners Lee's team at MIT have already 
experimented in this area.

> 
> > b. show the user which certificate he is actually using during a
> session at a site
> 
> With which actor? The primary document? An iframe?

First and formost to the site that is indicated in the URL bar, the 
prime origin. 

After that things are still open to User Interface Research. 

Contrary to appearances FIDO does not solve the problem either because
1) if it asks the user per origin the problem is the same and even 
  worse as the user may have to swipe his fingerprint for each origin.
2) if it automatically creates a public/private key per origin, then
  all we have here are just cryptographic strength cookies

The same problem would exist if each site in each iFrame asked for 
authentication using Basic Authentication, so this is not a problem
limited to client certificates.

It is basically a problem of what kind of policy one wishes to use when
following links in cross origin application. Does one consider one's 
interaction across the web as one of a single identity, or as an identity 
per site?

This problem comes to the fore in Linked Data UI research such as the
one Tim Berners Lee is experimenting at MIT's Distributed Information 
Group (DIG), and as I have been also working on for the past years.
If one wishes to have JS in the client follow linked data across 
origins then one is very quickly going to come across this problem, since
it is quite likely that certain resources across the linked data web 
are in fact protected. 

The lack of support currently requires such applications IMHO to move 
this decision  to the server, which can then authenticate for the user 
to the various web sites following the required chosen authentication 
policy. If the user is to be in control the UI built up in JS in the client 
from this will have to be clear and understandable. But whatever happens, 
this problem needs to be considered much more closely than it yet has.

If such a flexible policy were made available to JS in the browser, then the
browser would be able to take over this feature from ther server.

> 
> > c. enable a mode where the user is not using that certificate ( Chrome has the Persona UI for example ) [2]
> 
> This is key scoping. SOP is just another form of scoping.

But it's not the only one :-)


> 
> What I'd like to understand from you is why:
> 
> - removing <keygen> as currently shipped hurts anything

It removes the ability from the browser to create client certificates cheaply.
Without that client certificate authentication requires installation of 
certificates by hand, which is difficult, or use of external hardware devices,
which is badly supported. I can imagine though improving this with integration
with features that are coming from the FIDO alliance, such as hardware key
cryptography storage mechanisms, which would make the private key completely
unreachable other than via a precise API. Note that in this case again the
user is put in control, but via hardware integration, and via a number of
device specific user interfaces such as fingerprint swiping, etc... 

Note that WebID authentication allows any web site to produce 
useable cross origin certificates at 0$ cost ( http://webid.info/ )


> - why the Web Crypto solution isn't strictly better

The Web Crypto solution has the following limitations

 1) it can only store the key in the web local storage, meaning
 that the private key is available to all JS from that origin if
 the extractable=true attribute is set.
 Even if the extractable=false attribute is set, this does not help
 puting the user in control: since there is no chrome for him
 to specify this, it is for all intents and purposes something the 
 user has no knowledge of.
 
 Compared to this the <keygen> solution places the key in the 
 keystore, and ties the certificate to the private key after asking 
 the user. The private key cannot be accessed by any application 
 other than the keychain. The private key is safe.  This allows the 
 browser to have an identity distinct from the origin. Otherwise there 
 is no way really for anyone  to know if  the origin signed something 
 or the browser.

 2) the key stored in local storage cannot be used across origin.
 
 I'll note here that Jeffrey Yasskin responded to this point in the blink-dev 
thread https://groups.google.com/a/chromium.org/d/msg/blink-dev/pX5NbX0Xack/FSW2mol3BgAJ by writing

 > If someone runs an identity provider, perhaps with a Service Worker 
 > to work offline, relying parties can iframe the identity provider, 
 > and the identity provider can store a key in WebCrypto and prove its 
 > presence to the relying parties. With fallback request interception 
 > (https://github.com/slightlyoff/ServiceWorker/issues/684), the relying 
 > parties can also ping the identity provider from their service workers.

But this is still on the drawing board, and it is really not clear yet if this
does actually give us the needed capability. In any case unless we can agree
that cross origin authentication is a reasonable thing to do, I don't see
how this feature could make it to a final release, since it would be blocked
by exactly the same arguments we are having here. Finally it still would not 
address the following point 3)
  
 3) it does not put the user in control - as there is no tie in
   between the Web Crypto solution and the Chrome .
  It requires all the UI to be built by the Origin. 

 In short with this feature the browser looses all hold on identity and
ends up relagating all of it to the server. This seems to me to be serious
loss for browsers. 

> - what you imagine your ideal key provision solution to look like.
> 
> These can be stylized versions, but need to include detail sufficient to let us discriminate, e.g. main document from iframe.

The actual keygen solution seems to a good starting point, though there are clearly
improvements that can be made as suggested by Microsoft, and for which Tim Berners
Lee pointed out a number of possible solutions 
https://lists.w3.org/Archives/Public/www-tag/2015Sep/0034.html

Potentially this could be complemented with JS APIs. But I do think that the
declarative nature of keygen has some very good things going for it.

> 
> > At all these stages the chrome is giving the user control of decisions.
> > There is no JS agency that can take this over. Exactly for this reason SOP
> > does not apply, and it is exactly for this reason that chrome integration 
> > of identity is so important.
> >
> > This is my analysis. Where am I wrong about the non-application of SOP?
> > What Web Architectural Principles do you rely on to justify the application
> > to this case of certificate generation and useage.
> >
> > Sincerely,
> >
> > Henry Story
> >
> >
> > [1] see my previous mail "(un)linkability - Re: Agenda: <keygen> being destroyed when we need it" for references
> >    https://lists.w3.org/Archives/Public/www-tag/2015Sep/0023.html
> > [2] I realise now that  logout does not actually make sense because one a user has authenticated a cookie can be set to track him, or information kept in URLs. This should be explained somewhere.
> >
> >
Received on Sunday, 13 September 2015 11:36:02 UTC