Re: [XHR] Open issue: allow setting User-Agent? from Hallvord R. M. Steen on 2012-10-12 (public-webapps@w3.org from October to December 2012)

From: Hallvord R. M. Steen <hallvord@opera.com>
Date: Fri, 12 Oct 2012 11:12:07 +0000
To: Julian Aubourg <j@ubourg.net>
Cc: public-webapps@w3.org
Message-ID: <20121012111207.0l35hs4no0wgkos0@staff.opera.com>
[Editorial note: I respond to two E-mails by Julian at once, and I  
have done some re-ordering/interleaving to keep discussion of related  
points together. This editing was in no way meant to reduce the  
coherence of Julian's arguments, I hope it doesn't. The discussion of  
rationale is "hoisted" to the start of this E-mail, threat and abuse  
risk later.]

>>> If it's a mistake on the backend's side (they filter out while  
>>> they didn't intend to)
>>> just contact the backend's maintainer and have them fix this server-side
>>> problem... well... server-side.

To add to what Mike said - I've been doing what you propose as part of  
my job for about ten years now. You might think that sniffing is done  
on purpose to serve different browsers suitable content, and you might  
think broken sniffing on the backend is mainly a problem with  
un-maintained legacy sites. Per my experience, I'd say both  
assumptions are mistaken.

Consider the household name that is one of the web's major E-commerce  
destinations. I've seen it remove random chunks of required markup  
because of obscure backend browser sniffing. I've seen it remove *the  
contents of JavaScript variables* inside inline SCRIPT tags - instead  
of <script>var obj={ /*several lines of methods and  
properties*/};</script> it sent <script>var obj={};</script>! I've  
seen it drop random stylesheets and jumble the layout.

The best part? For nearly 10 years, said site has been promising to  
fix the problems! They always plan to get it done in the next quarter  
or something. And this is clearly *not* an unmaintained backend - it's  
a site that must be spending hundred thousands of dollars on backend  
maintenance and development per year. Seems there just isn't any  
single developer who has sufficient overview of and responsibility for  
their browser detection policies to actually do something about it.

Given such experiences, I don't consider it unlikely that a site would  
intend to enable CORS, make the required backend changes to share some  
of its content, but fail entirely to "go that extra mile" and fix the  
sniffing. In fact, I think it would be *more* likely that CORS is  
bolted-on without any browser detection-related changes and testing.

And you expect that when you contact a site saying "Hello, I made a  
neat CORS news aggregator, it's used by 10 000 people, but the 250  
Opera users among them have trouble because your site detects "Opera"  
and gzips content twice, making my app show them binary garbage on  
screen" - they would care? You really think so? Great, we're hiring  
browser evangelists (and I'm pretty sure Mozilla is too) so we need  
crazy optimists to apply :-)

> The problem is that the same reasoning can be made regarding CORS.

This is not in fact a problem ;-). Both general CORS security policy  
and this is a judgement call. It's not a problem if we resolve them  
differently - we just need to weigh the pros and cons to see which  
side of the argument is stronger.

> If we had a mechanism to do the same thing for the fact of modifying the
> UserAgent header, I wouldn't even discuss the issue.

For CORS usage, we have that already: the Access-Control-Allow-Headers  
response header. So as Anne pointed out, this is already opt-in for  
CORS. Per your comment above we should hence focus our discussion on  
potential threat scenarios for same-domain usage (or agree to allow  
setting User-Agent for CORS but not local requests?)

Moving on to discussion of proposed threat scenarios, first DOS:

Glenn Maynard <glenn@zewt.org> wrote:
>> Are you really saying that backend developers want to use User-Agent to
>> limit the number of requests accepted from Firefox?
>>  (Not one user's Firefox, but all Firefox
>> users, at least of a particular version, combined.)

To which Julian responded:

> A more likely scenario

So we are in agreement that the "use User-Agent header's value to  
prevent DOS" scenario is unlikely? :-) (More on the alternate scenario  
later).

>>> Now, read back your example but suppose the attack is to be pulled
>>> against cnn.com. At a given time (say cnn.com's peek usage time), the
>>> script issues a gazillions requests. Bye-bye server.

This is already possible, since XMLHttpRequest (and event  
script-generated IMG or script-based form submits) exist. The question  
we're trying to figure out is whether being able to change User-Agent  
causes *greater* risk. I hope we can agree that the server's measures  
against DOS are unlikely to be based on User-Agent and, that if they  
were, a script changing User-Agent would make those requests *simpler*  
to reject, not harder. (I.e. through a "User-Agent of incoming request  
does not match User-Agent associated with cookie session - reject!"  
logic).

>> I'm confused.  What does this have to do with unblacklisting the
>> User-Agent header?
>>
>>> That's why I took the ad example. Hack a single point of failure (the ad
>>> server, a CDN) and you can DOS a site using the resource from network
>>> points all over the net. While the frontend dev is free to use scripts
>>> hosted on third-parties, the backend dev is free to add a (silly but
>>> effective) means to limit the number of requests accepted from a browser.
>>> Simple problem, simple solution and the spec makes it possible.

Only if you assume that a web site will say "We're being DOS-attacked  
- quick, stop accepting requests from MSIE!". This would certainly be  
even more of a nuisance to their visitors than the attack itself, so  
as a strategy against DOS it would make little sense.

> A more likely scenario is a URL that only accepts a specific user agent
> that is not a browser (backend). If user script can change the UserAgent,
> it can request this URL repeatedly. Given it's in the browser, a shared
> resource (like an ad provider or a CDN) becomes a very tempting point of
> failure.
>
> AFAIK, you don't have the same problem with PHP libs for instance (you
> don't request same from a third-party server, making it a potential vector
> of attack).

It seems we're to some extent mixing two threat scenarios here: the  
"DOS" and the "unauthorized access" ones. Repeated requests is more a  
DOS-type problem, secret backend URLs with User-Agent filtering is an  
unauthorized access problem. Let's discuss them separately.

If the threat is DOS-type attacks, using a secret URL (thereby giving  
away your knowledge of that secret URL and the token to access it to  
technical staff analysing and stopping the attack) would only make  
sense, compared to accessing a public one, if the secret URL did a lot  
more heavy lifting so that the site could be taken down with fewer  
requests. If I were a hacker, I would rather use a botnet or something  
similar for this purpose, because it would make it harder to detect  
that I knew the secret URL and its token..

Unauthorized access might be worse from a browser than a shell script  
if the JS could make use of information in the browser (e.g. session  
cookies) to run an effective attack against the secret URL. I would  
however assume that a part of the backend dealing with session cookies  
would not generally be limiting itself to requests from other  
*servers*, they would presumably not have sessions created.

At this point we're making a lot of hypothetical assumptions, though..  
To make things a bit more real I've tried to find examples of real  
user-agent detection and filtering in backend scripts. I've checked  
PayPal's Instant Payment Notification backend (a PHP script PayPal  
provides that will live in a secret location on your site and receive  
a POST from PayPal with information when a payment is successfully  
made). It is an example of a secret backend script a hacker would have  
financial/practical motivations for attacking. It does not make any  
attempt at checking the User-Agent string to see if it is indeed being  
contacted by PayPal's server.

There are several examples of filtering scripts for blacklisting  
User-Agents, for example this Bad Behaviour Wordpress plugin:
http://code.ohloh.net/project?pid=f1ZpDuUCZw8&browser=Default&did=bad_behavior%2Fpublic_html&cid=kcopgYkVDm4
This does however not have any implications for our reasoning or  
decision. It would make no sense for a script to set User-Agent to opt  
into being blacklisted.

Does anyone on public-webapps know an example of a backend that uses  
white-listing UAs as a security measure? If anyone has actually seen  
or implemented this I'd love to hear about it...provided anyone reads  
this far. I didn't actually expect to generate so much discussion here  
;-)

-Hallvord
Received on Friday, 12 October 2012 11:12:39 UTC