Re: Comment on minutes ## With Credentials flag etc from Tim Berners-Lee on 2016-03-31 (www-tag@w3.org from April 2016)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 31 Mar 2016 23:29:51 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Mark Nottingham <mnot@mnot.net>, Public TAG List <www-tag@w3.org>
Message-Id: <278C42DB-3B2E-4079-B0E1-CEC9BF2AB701@w3.org>
> On 2016-03 -31, at 18:50, Jonas Sicking <jonas@sicking.cc> wrote:
> 
>>> Q: Why are security checks performed when withCredentials is set to false?
>>> A: Because the user, and the user's browser, might be behind a
>>> firewall and so might be able to access servers which a website would
>>> otherwise not be able to access.
>>> 
>>> Sadly there is no, to me, known mechanism for detecting if a given
>>> server is behind a firewall.
>> 
>> That’s a long rathole but ...
>> 1) If your local IP address is the same as the one you get from a public IP reflector then you are not behind a firewall
>> 2) If your IP address starts with 192.168… then you are behind a firewall …
>> 3) BUT that isn’t the point, you can be outside a firewall and still have privileged access by your IP address.
>> 4) And you could also be behind a carrier-grade NAT box but not have any privilege access as a result.
>> 
>> One possible but hard route is to pursue something like the router telling your machine whether it has no privileged access, which would then enable a lot of stuff.  So public internet spaces would set the flag, which would then mean the browsers would do less preflights, wasted attempts to access stuff, etc and so the browser would run more quickly for less bandwidth.
> 
> If we get to a point where clients can reliably tell if a given
> request goes to a server that is behind a firewall or not, then we
> should certainly take advantage of that.

yes


> 
>>> Q: Why does CORS not allow "Access-control-allow-origin: *" together
>>> with withCredentials=true?
>>> A: It was felt that this was too big of a foot gun.
>>> 
>>> CORS was designed not long after Adobe had added the crossdomain.xml
>>> feature to Flash Player. The crossdomain.xml feature allows webserver
>>> administrators to easily indicate that the server contains resources
>>> that should be loadable from other origins. The feature only allowed
>>> "normal" requests, i.e. requests similar to ones that CORS makes when
>>> withCredentials=true.
>>> 
>>> When crossdomain.xml was released many websites opted in allowing data
>>> to be read from other websites in order to share some public data that
>>> was hosted on the server. Unfortunately they forgot that some other
>>> URLs on the server served sensitive user data. The result was that
>>> relatively quickly after the release of the crossdomain.xml multiple
>>> websites leaked sensitive user data.
>>> 
>>> You could argue that the problem was that crossdomain.xml was
>>> different since it is a per-server configuration file, whereas CORS
>>> uses per-URL headers. Hence CORS would be less prone to server
>>> administrators accidentally opting in to sharing on URLs that server
>>> user sensitive data.
>>> 
>>> However in practice many (most?) popular web servers allow adding
>>> configuration files which add static http headers to all URLs under a
>>> given directory. So in practice on many servers it would have been
>>> just as easy to make the same mistake with CORS.
>> 
>> Any arguments about making things easy or difficult for server admins to
>> shoot themselves is protecting the (server) user from themselves which should always be done with care.
>> The user is the most important.
> 
> Absolutely. The current CORS design was done after a lot of web
> administrators which deployed crossdomain.xml did so in a manner that
> leaked sensitive user data to anyone with a website.

> 
>> To first order, the system must implement a security protocol which allows
>> people to do the right thing — to give the right access to the right resources
>> by the right people and origins.  Yes, by all means allow the server to protect data
>> by default, but make it clear what is happening and allow the server operator easily to
>> tell the server what the situation is.  (you are/not running on the open internet.  This information itself is/not quite public)
>> 
>> When you make something impossible using HTTP in the browser, that is a big deal.
> 
> Yup. Though as far as I can tell, none of the use cases discussed so
> far is impossible with CORS. In fact, we worked quite hard to make all
> of HTTP accessible in even the initial release of CORS, despite the
> fact the by far most common use cases only use a small subset of HTTP.
> 
>> Q: Why was reflecting the incoming origin in the header the thing which was picked
>> as the way of saying “yes this really is public”?  Why not “access-control-allow-origin **” or something
>> It is a pain to code, needs two or three lines of not-newbie-obvious .htaccess in Apache, etc.
> 
> The main reason that reflecting the origin was chosen was so that an
> intermediate cache wouldn't cause the browser to see the signal "yes
> this is a public resource”.

This is the first time I have come across this reason. Interesting.
What was the failure mode?   The web browser goes though an intermediate caching proxy.
The proxy either forwards all the headers both ways, then surely everything works, with * or reflection of the origin?
What was the failure mode you were thinking of?

> 
>> Result? the recipe for fixing it is sent around to do the origin reflection, and new server code does it by default for everything.
>> Because CORS is such a pain for developers to deal with on the client side, with no error codes, etc
>> that servers who want stuff to just work, and slap in the strongest CORS medicine they find on the net.
> 
> I've never heard of servers doing this origin-reflection by default.
> If that really is the case then I agree that that would be very
> concerning since it would put us in an even worse state than
> crossdomain.xml was in.

Yes

>>> Q: Why does CORS not allow listing multiple origins, or allow pattern
>>> matching, in the "Access-control-allow-origin" header?
>>> A: It was felt that if the server uses dynamic server-side logic to
>>> generate responses for a given URL, that they could also then
>>> dynamically generate the appropriate Access-control-allow-origin
>>> header.
>>> For servers that generate static responses you can generally simply
>>> use "Access-control-allow-origin: *”.
>> 
>> Well no, not if they only want 7 specific domains to have access.
> 
> If the response is entirely static, and the server is connected to the
> public internet, then there is little benefit to limiting the response
> to 7 specific domains.
> 
> In practice such a response is world readable anyway since anyone in
> the world could use curl or wget to read the data. No matter what is
> sent in any CORS headers.

You are right that there is no point allowing any web page to access something if it is not public.

Suppose the resource is not public.   It is only available to certain people.
Imagine that we had a list of w3c C reps and their phone numbers which 
we wanted to be accessible to those reps (using their passwords) 
and we wanted them to be able to use web apps on tools.ietf.org and
tools.w3.org to access them, and no other web apps to access that data.

The server checks the authorization of the user and rejects it 401 or 403 if not OK.
It then checks the origin header and if it does not match the two origins it gives a 499 error code which we don’t have???
There is no point returning data to an app which isn’t allowed it.

If the origin, say tools.ietf.org, (is omitted or) matches, then it returns 200 and could give both origins in the ACAO header,  but it can’t it has to reflect the current requesting origin.  
(If that response is cached by a proxy, which later respond s to a request from tools.w3.org. But presumably the server sends Vary:Origin to prevent that happening in the proxy)




> 
>>> Keep in mind that static
>>> responses can generally be read from non-browser HTTP clients like
>>> curl anyway.
>>> 
>>> This doesn't account for static responses which are password protected
>>> using either cookies or auth headers. So yeah, our solution here is
>>> not perfect, but we decided to opt for simplicity.
>>> 
>>> My personal hope was also that generic server modules would be written
>>> to handle CORS support and which would simplify situations like this.
>>> I'm not sure if such modules exist yet or not.
>> 
>> There are lots. They may be turned on by default.  A concern is they tend to just defeat CORS
>> and they don’t necessarily distinguish between public resources and others.
>> 
>> Also people use CORS proxies to access the web, which are associated
> 
> Does anyone have any examples of websites that have shared data that
> should have been kept private publicly?
> 
> We saw plenty of examples where that happened with crossdomain.xml,
> but I've personally not heard that it's happened with CORS. But
> absence of evidence is not evidence of absence, so I'd be very
> interested to hear about any breaches.
> 
> 
>> I think the top three issues the TAG had were basically
>> 
>> a) Having the withCredentials flag as a parameter to fetch() is broken.  In general the middleware which calls fetch() will not have magic application-level knowledge of which resources it is going to fetch are public, which are private.   So a general the fetch has to work without that hint, and do the right thing. (opinions vary here)
> 
> As was requested in
> https://github.com/w3ctag/spec-reviews/issues/76#issuecomment-183317897
> CORS enables a mode where you don't have to have any knowledge about
> what the security policy of the server is, and you only have to define
> a URL. CORS then adapts the wire protocol in order to make that safe.
> 
> The way that you do this is that you always set the withCredentials
> flag to true. Then, as wire protocol you send the
> access-control-with-credentials and access-control-allow-origin
> headers with the appropriate values.

Ok, so then the protocol is that the withCredentials flag is always set in the client.
Then the “ACAC: * ”  header is never accepted by the browser, and so logically it should be never sent by the server, and removed from the documentation.  The server returns 200  with ACAC <origin>  if the document is public or the user is allowed access. The browser has no way of knowing whether the resource is actually public or not, whether it can be shard with specific other origins, or with the whole world.  That surely seems less than optimal, disabling useful caching?



(Maybe we should go back to the server response like
 Public: GET
to be able to just flag that something is not protected at all.)


> 
>> b) For a webapp which needs to load stuff from the net, the lack of clear error conditions makes it hard to understand what is going on.
>> c) Asking server writers to do the origin reflection thing is unreasonable
> 
> I suggest that the TAG work with the webappsec WG in order to come up
> with concrete proposals for these two. That way we can compare
> security aspects, wire protocols, performance, etc.

Sounds like a good idea.

> 
> / Jonas
>
Received on Friday, 1 April 2016 03:39:53 UTC