Re: (XMLHttpRequest 2) Proposal for cross-site extensions to XMLHttpRequest from Ian Hickson on 2006-04-14 (public-webapi@w3.org from April 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 14 Apr 2006 05:46:30 +0000 (UTC)
To: Mark Nottingham <mnot@yahoo-inc.com>
Cc: public-webapi@w3.org
Message-ID: <Pine.LNX.4.62.0604140148590.21459@dhalsim.dreamhost.com>
On Thu, 13 Apr 2006, Mark Nottingham wrote:
> > > 
> > > This proposal has some nice attributes, but it's very complex.
> > 
> > I think it is simpler (a lot simpler) than a system based on a known 
> > location, with its own file format, etc. This is especially the case 
> > given that we need <?access-control?> anyway -- my proposal is just an 
> > additional set of rules for how to restrict XMLHttpRequest in a way 
> > that relies on that separate spec, and is describable in a few 
> > paragraphs.
> 
> Well, this proposal makes some trade-offs, relative to a known location 
> / site-wide policy. It enforces a one-to-one resource-to-policy 
> relationship, which allows the scenarios you mention (e.g., many authors 
> on one server, administrative access issues) but it incurs considerable 
> overhead for sites that *do* have homogenous policies and/or a single 
> administrator; each distinct unsafe request will require a separate 
> policy request, and the policy will need to be co-ordinated across a 
> potentially large number of resources.

The policy can be set site-wide with a single line in a single file, if 
you're using Apache (and most people are). I would imagine it to be easy 
on other systems too (IIS, e.g.).

In any case this is largely academic. In practice there won't be that many 
resources that you'll be accessing cross-site, especially in the course of 
a single session.

Having said all that, I'm of the opinion that we can probably drop the 
round-trip GET request altogether. I've been doing some research and I 
can't find any services that this would break. The only thing you can't 
actually already do today is read the data from the server, and change the 
Content-Type header (and other headers) -- everything else is already 
possible. What services do you think are vulnerable to this new mechanism 
that aren't already vulnerable today?


> What about allowing in-content / in-header policy for safe methods, and 
> going to a well-known location for unsafe methods?

Since in reality the problem is with the response, not the request, I'm 
starting to become of the opinion that there aren't any unsafe methods.

However, even if we decide we really do need write-protection, I would 
object to a system based on a well-known location for all the reasons I 
gave before.


> > > If it were me, I'd be inclined to put a known location in the WD 
> > > (say, /w3c/access-control) in order to get the TAG -- or anybody 
> > > else -- motivated to come up with a better solution.
> > 
> > That's a very dangerous way of designing specs. You are most likely to 
> > end up having implementations of your straw man.
> 
> Perhaps. I'm concerned about proliferating models and means of 
> attachment of Web metadata; this, the content labels work, Web 
> description, etc. It would be good if every WG didn't invent a different 
> way of doing this. The well-known location cat is already out of the 
> bag; this is a new and unknown beast.

Inventing a new data format is inventing something new. Reusing the voice 
browser group's access control PI is reusing an existing technology.


> > > By that time the side effects have already happened on the server 
> > > side. Many CGI tools (unfortunately) treat GET query args and POST 
> > > bodies as equivalent, so there will be situations where it's 
> > > possible to craft an attack against a server whereby a GET has side 
> > > effects.
> > 
> > This is out of scope for this proposal since it is already possible to 
> > do both GET and POST submissions to arbitrary URIs without any 
> > protection whatsoever.
> 
> OK (assuming you're referring to script tags and the like).

"And the like" is a bit of an understatement. Off the top of my head, you 
can already do arbitrary GET using:

   <script src>
   <style> @import
   <link rel="stylesheet" href="...">
   <meta http-equiv="Refresh" ...>
   <img>
   <object>
   <iframe>
   <frame>
   <embed>
   In CSS: background: url()
   In CSS: list-style-type: url()
   In CSS: content: url()
   HTTP redirects
   Web fonts

...the list goes on, I'm sure.

You can also, with <form>, do arbitrary data POST to any URI (with no or 
little control over the headers -- not that CGI scripts look at the 
headers anyway).


> As stated before, I'm not sure the existence of one hole justifies the 
> intentional opening of other holes.

It's not "one hole". Most of the Web works this way, always has.


> > The XMLHttpRequest cross-site protection only needs to protect
> > against two things:
> > 
> >  1. Actually reading the data that is returned, and
> > 
> >  2. Sending of request entity body payloads that are MIME types other than
> >     text/plain, multipart/form-data, application/x-www-form-urlencoded,
> >     and application/x-www-form+xml.
> > 
> > The second is only to protect against hypothetical servers that are
> > actually checking the Content-Type of submissions. In practice I doubt
> > it'll make the slightest difference.
> 
> Not following you; why should other media types be prohibited? E.g., why 
> can't I POST or PUT some JSON or RDF to another site, if it wants to let 
> me?

You can. My point is that the only thing that cross-site XMLHttpRequest 
lets you do (other than reading the data that is returned) which existing 
mechanisms don't let you do, is change the Content-Type header (and other 
HTTP headers). So the only vulnerability we need to worry about is a site 
that only accepts data with a particular type (or with particular headers 
set). Any other service is already "vulnerable". And that's the only 
reason we're doing this GET-before-POST thing.


> > My proposal actually protects more than that, it protects against 
> > reading the returned data and _any_ entity payloads. This is overkill, 
> > but makes the model simpler. (The extra roundtrip is only required for 
> > the second of these, which is probably overkill. We could probably 
> > drop it.)
> 
> Again, not following you; I thought the point of the second round trip 
> was to avoid any undesired state changes / side effects on the server.

It is, but it's only preventing those _if_ the server is checking the type 
to make sure it's the right type (and then only if the "right type" is not 
one of text/plain, multipart/form-data, application/x-www-form-urlencoded,
and application/x-www-form+xml).


> > Referer has path information, which is a privacy problem; 
> > Referer-Domain would only include the domain, to get around this. (And 
> > the scheme, to allow for checks against DNS spoofing, but that's a 
> > minor detail.)
> 
> Could you go into that a bit more deeply? The site with control over the 
> cross-site request is the same party that controls how the Referer is 
> constructed (by controlling how their URIs are laid out), so what's the 
> exact concern here? How is this different from a normal link between 
> sites?

Certain users are concerned that referers will let other sites know what 
they are doing, and so disable Referer headers, sometimes at levels that 
the UA has no control over, for example in proxies.

Also, any request from an HTTPS page to an HTTP page has its Referer 
header removed.

Thus we need a way to include the pertinent parts -- the domain and the 
protocol -- in the headers, so that the remote site can make an educated 
guess as to the intent of the first party and decide whether or not to 
grant that page access to its data.


> There's also a bit of asymmetry here with the goals you stated earlier; 
> you wanted to allow people who didn't control a whole site to set access 
> control, but Domain doesn't allow the target of that XHR request to 
> identify the resource accessing it beyond the site it's on.

Correct, because normal DOM scripting security rules would let you 
sidestep any more fine-grained control (you have full access to any page 
on the same domain, and can make such a page do whatever you want, e.g. 
through script injection). It only makes sense to provide the information 
that is not imminently spoofable.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 14 April 2006 05:46:39 UTC