Re: Opting into cookies from Jonas Sicking on 2008-05-30 (public-appformats@w3.org from May 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 30 May 2008 09:43:19 -0700
To: Anne van Kesteren <annevk@opera.com>
Cc: Ian Hickson <ian@hixie.ch>, public-appformats@w3.org
Message-ID: <48402EA7.8000406@sicking.cc>
Anne van Kesteren wrote:
 > One of the arguments from Mozilla I distinctly remember was the
 > copy-and-paste authoring cult and that if Firefox would be first,
 > Firefox would also the only one being vulnerable in case a server
 > became misconfigured.

This is not at all true. Not sure where you got this idea, I'd be 
interested to hear, but it is certainly not how we operate. If we think 
a spec is safe enough we have no problem implementing it. If we don't 
think it is safe enough we don't think anyone should implement it since 
it will put web users at risk, and we wouldn't want to implement it even 
if other browsers did. Being first certainly carries some risk, but it's 
risk we gladly take in order to move the web forward. I assume this is 
how Opera does things too?

So moving on...

I strongly think we need to do something like this. The reason is to 
reduce security considerations and thus complexity for servers that opt 
in to Access-Control in order to share public data.

Like with the Access-Control-Extra-Headers and Access-Control-Methods 
headers I proposed in a separate thread (please post comments on those 
headers in that thread, not here) the idea behind opting in to cookies 
is to protect server operators that _do_ opt in to the access-control 
security model. The fact that access-control is fully opt-in means that 
server operators that don't use access-control at all are automatically 
fully protected. However we still want to protect server operators that
do opt in.

The security model for the server is for most servers severely impacted 
by if the credentials of the user is included in the Access-Control 
request. If the credentials are not included then the types of requests 
that Access-Control can make is exactly zero, and the type of data that 
can be leaked, no matter how the server is configured, is exactly zero. 
This is because anyone can make these requests to the server directly, 
without using a browser, and read the data that is returned.

The exception to this is servers behind firewalls on intranets. These 
servers can today rely on that the general public can't make arbitrarily 
formatted requests, and they can rely on that (almost) no data can be 
read by people outside the firewall.

So if a server, that is not behind a firewall, can opt in to 
Access-Control without cookies, it can in all cases do so safely without 
having to worry about leaking private data or becoming vulnerable to 
CSRF or similar attacks. This is something that I think would be very 
valuable to a lot of servers.

Sites like craiglist would be able to turn on this part of 
access-control without having to worry about there being some URI in 
their URI space where users can modify their previous postings, or 
create new postings using previously entered contact information.

Similarly google maps could turn it on to enable sites to query for map 
information, without having to worry about this also enabling stealing 
of previous searches or home address configurations.

So in other words this would allow sites that want to allow mashups of 
public data being able to do so without having to worry about securing 
their site.

This wouldn't help the set of operators that do want to enable sharing 
of private data of course, but this is a much smaller set of sites. And 
  it would allow those sites to be more selective about which URIs they 
would share private data on and which to share public data.


The big problem with this is that it introduces additional input into 
the Access-Control algorithm. Currently the only input is the URI to 
load (the rest can be deduced by the UA), this means that Access-Control 
easily fits into any existing API for loading URIs without any 
modifications.

For non-GET requests this is all pretty easy, we'd just indicate in the 
OPTIONS reply to the preflight request weather cookies should be 
included in the main request or not. For GET requests this is trickier.

Here is three proposals for how it could work:

A)
The loading API indicates if it wants to include cookies in the request 
or not, for example on XMLHttpRequest we would add a .loadPrivateData 
boolean property. We then make the request with or without cookies as 
requested. If the reply from the server does not have the cookies-opt-in 
header and .loadPrivateData was false then all is good and we check the 
normal Access-Control headers to see if access should be granted. 
Similarly if .loadPrivateData was true and the reply from the server did 
include the cookies-opt-in header then the normal Access-Control 
algorithm is followed. However, if there is a mismatch between 
.loadPrivateData and the presence of the cookies-opt-in header then 
access is always denied.

The downside with this solution is that the Access-Control algorithm 
needs to be fed that extra boolean of data. In XMLHttpRequest this isn't 
really a problem. However for something like the XBL PI, or a document() 
call in an XSLT stylesheet, there is no obvious way to indicate if the 
request should include cookies or not.

B)
This is similar to solution A. However rather than having a separate 
flag to indicate if private data is loaded or not, we include it in the 
URI. So if you want to load the resource http://example.com/address with 
cookies you instead load private-data:http://example.com/address. So 
this way the "should cookies be included" is indicated as part of the 
URI rather than out-of-band.

I'm not sure how easy this would be to implement in other UAs. It would 
be quite easy in Firefox since we already have support for nested 
protocols, but I'm not sure if that is the case in other UAs.

C)
We could for Access-Control requests use different names for the Cookie 
and Auth headers. This way the server would "opt in" to getting cookies 
by listening to different header names. While this is cleaner on the 
client side than the above two proposals, it has several disadvantages.

First of all it means that we are no longer fulfilling our requirement 
to reuse the existing server architectures.

There is also a risk that servers will simply start to always listen to 
both the normal header names and the access-control names, thus negating 
the whole opt-in mechanism.

Lastly, it means that proxy servers will no longer recognize the 
Authorization header which I think means that there's a risk that it 
could cache a authorized request.


Personally I think proposal B is the best one.

This is definitely a tricky problem and I can't think of any really good 
solutions. But I do think it's a problem we need to solve in order to 
allow sites that serve public data to easily make that available for 
mashups. Private data is always tricky since the site has to ensure that 
the user is ok with sharing the data. But sharing public data should be 
made easy.

/ Jonas
Received on Friday, 30 May 2008 16:44:53 UTC