Re: HTTPS rewriting vs Origin from Thomas Roessler on 2008-11-24 (public-bpwg-comments@w3.org from October to December 2008)

From: Thomas Roessler <tlr@w3.org>
Date: Mon, 24 Nov 2008 14:33:45 +0100
To: Francois Daoust <fd@w3.org>
Cc: public-bpwg-comments@w3.org
Message-Id: <883E631F-DB08-4988-BECC-C97D0FF61A6B@w3.org>
(Apologies for the lateness of this response -- I'm just now digging  
out of a huge pile of e-mail after having been away for a bit.)

On 14 Nov 2008, at 17:29, Francois Daoust wrote:

> Thanks Thomas!
>
> I'm trying to get a precise view of things that may break here. See  
> comments inline.
>
>
> Thomas Roessler wrote:
>> Rewriting of HTTPS URIs implies that the origin of web applications  
>> is changed.
>
> Content Transformation for Web applications is more than likely to  
> break things in any case.
>
> It is not explicitely forbidden in the guidelines because there is  
> no way to define standard Web browsing (whatever this may mean). The  
> latest draft of the guidelines says:

"Don't break the assumptions in chapter 5 of HTML5" might be a good  
heuristic to apply.  I'm not suggesting a formal dependency, though.

Also, note that the scenarios that I'm describing here are basically  
about "standard Web browsing as we know it".

> [[ Before altering aspects of an HTTP request proxies need to take  
> account of the fact that HTTP is used as a transport mechanism for  
> many applications other than "Traditional Browsing". Alteration of  
> HTTP requests for those applications can cause serious mis- 
> operation. ]]
>
> http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/081107#sec-non-web-browsers
>
>
>> This is likely to break a number of things:
>> - access to cookies
>> - web applications that rely on the same origin policy
>> - access to any functionality that keys off the origin
>
> Is it specific to HTTPS though?

It is not specific to HTTPS; however, from my admittedly cursory  
reading of the guidelines, the rewriting approach is only condoned in  
the HTTPS case.  If similar rewriting is expected in other  
circumstancese, then the same comment applies.

> I mean, the same thing may occur if a CT-proxy rewrites a normal  
> HTTP link and artificially change the origin, right?

correct.

> I'm not trying to dismiss the comment here, simply trying to  
> understand what it encompasses and where it may fit.
>
>
>> The breakage will come in several flavors:
>> - the application's actual origin will be distinct from the one  
>> expected by code within the application
>
> Understood. Same as above, I don't think that's specific to HTTPS.
>
>
>> - origins that are expected to be distinct may be mapped to the  
>> same string
>
> I'm not sure I really understand the problem here.
> What is the difference between that situation and, say, two users on  
> the same computer connecting to the same Web site using HTTPS? From  
> the point of view of the server, there ought to be two origins in  
> this case, right?

The origin of a web application is (in simplistic terms) the (scheme,  
host, port) tuple of the web page that controls the application. For  
edge cases and more details, see:

   http://www.w3.org/html/wg/html5/#origin-0

I.e., the case we're talking about here is *not* about two different  
clients, but about creating a situation in which web applications from  
whatwg.org and w3.org suddenly both come from  
contenttransformproxy.example.org (and can therefore script each  
other, thanks to the proxy being in place).

> Could there be a case where the server replies to someone with the  
> data of someone else? If so, is there anything the proxy may do to  
> prevent that from happening and that we could recommend? (e.g.  
> proxies MUST NOT re-use TLS connections for different clients, or  
> something similar that actually means something?)

There is actually additional breakage around the same-origin policy  
for XMLHttpRequest -- with a content transformation proxy in place  
that maps different origins to the same domains (and presumably embeds  
the real URIs with the path component somewhere), XHR can be used to  
read content from arbitrary origins.  Not good.

To cure things, one could think of synthesizing domain names in the  
rewriting exercise.  *However*, scripts can manipulate the "effective  
origin" of an application up to a certain level; therefore, we'd still  
have a situation in which additional attack surface is created.

I guess what this fundamentally boils down to is: "URL rewriting  
breaks things, badly. Don't do it."

>> - the application's origin when ran through a content  
>> transformation proxy will be distinct from the origin when ran  
>> without the proxy, breaking persistent stores on the client-side.

> Understood as well, and equally triggered by the change of origin,  
> so not specific to HTTPS.

Indeed.

Don't change origins.
Received on Monday, 24 November 2008 13:33:58 UTC