Re: [XHR] Open issue: allow setting User-Agent? from Hallvord R. M. Steen on 2012-10-09 (public-webapps@w3.org from October to December 2012)

From: Hallvord R. M. Steen <hallvord@opera.com>
Date: Tue, 09 Oct 2012 17:33:58 +0200
To: "Julian Aubourg" <j@ubourg.net>, "annevankesteren@gmail.com" <annevankesteren@gmail.com>
Cc: "Anne van Kesteren" <annevk@annevk.nl>, "Jungkee Song" <jungkee.song@samsung.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <op.wlw36wina3v5gv@hr-desk>
Julian Aubourg <j@ubourg.net> skreiv Tue, 09 Oct 2012 16:34:08 +0200

>>> I've had trouble writing extensions and user scripts to work around
>> backend sniffing, due to being unable to simply set User-Agent for a
>> specific script-initiated request and get the "correct" content. As I've
>> attempted to explain to Anne, I think this experience is relevant to
>> scripts using CORS, because they also want to interact with backends the
>> script author(s) don't choose or control.
>
>  If the backend sniffs out (all or some) browsers, it's the backend's
> choice.

We end up in a philosophical disagreement here :-) I'd say that whatever  
browser the user decides to use is the user's choice and the server should  
respect that.

> CORS has been specified so that you NEED a cooperative backend.
> Unlock a header and some other means to sniff you out will be found and
> used :/

Anne van Kesteren also makes a similar point, so I'll respond to both:

> If you consider CORS you also need to consider that if we allow
> developers to set user-agent a preflight request would be required for
> that header (and the server would need to allow it to be custom). So
> it's not quite that simple and would not actually help.

One word: legacy. For example Amazon.com might want to enable CORS for  
some of its content. The team that will do that won't necessarily have any  
intention of blocking browsers, but will very likely be unaware of the  
widespread browser sniffing in other parts of the Amazon backend. (With  
sites of Amazon's or eBay's scale, there is in my experience simply no  
single person who is aware of all browser detection and policies). Hence,  
there is IMO non-negligible risk that a large web service will be  
"cooperative" on CORS but still shoot itself in the foot with browser  
sniffing.

If I write, say, a CORS content aggregator, I would want it to run in all  
browsers, not only those allowed by the content providers. And I'd want to  
be in control of that. Hence, in my view this issue is mostly a trade-off  
between something script authors may need and more theoretical purity  
concerns.

>>> The changed User-Agent will of course only be sent with the requests
>> initiated by the script, all other requests sent from the browser will  
>> be normal. Hence, the information loss will IMO be minimal and probably  
>> have no real-world impact on browser stats.

> var XHR = window.XMLHttpRequest;
>
> window.XMLHttpRequest = function() {
>    var xhr = new XHR(),
>        send = xhr.send;
>    xhr.send = function() {
>        xhr.setRequestHeader( "User-Agent", "OHHAI!" );
>        return send.apply( this, arguments );
>    };
>    return xhr;
> };

Yes, this could give a generic library like jQuery less control of the  
contents of *its* request. However, there will still be plenty of requests  
not sent through XHR - the browser's main GET or POST for the actual page  
contents, all external files loaded with SCRIPT, LINK, IMG, IFRAME, EMBED  
or OBJECT, all images from CSS styling etc. Hence I still believe the  
information loss and effect on stats will be minimal.

Also, the above could be a feature if I'm working on extending a site  
where I don't actually fully control the backend - think a CMS I'm forced  
to use and have to work around bugs in even if that means messing with how  
jQuery sends its requests ;-).

>>> If your backend really relies on User-Agent header values to avoid  
>>> being
>> "tricked" into malicious operations you should take your site offline  
>> for a
>> while and fix that ;-). Any malicious Perl/PHP/Ruby/Shell script a  
>> hacker
>> or script kiddie might try to use against your site can already fake
>> User-Agent
>
>
> Oh, I agree entirely. Except checking User-Agent is a quick and painless
> means to protect against malicious JavaScript scripts. I don't like the
> approach more than you do, but we both know it's used in the wild.

I'm afraid I don't know how this is used in the wild and don't fully  
understand your concerns. Unless you mean we should protect dodgy SEO  
tactics sending full site contents to Google bot UAs but a paywall block  
to anyone else from user-applied scripts trying to work around that?

>> A malicious ad script would presumably currently have the user's web
>> browser's User-Agent sent with any requests it would make

> The malicious script can trick the server into accepting a request the
> backend expects to be able to filter out by checking a header which the
> standard says is set by the browser and cannot be changed by user  
> scripts.
> Think painless DOS with a simple piece of javascript.

I still don't fully understand the scenario(s) you have in mind.

For a DOS attack you'd be launching it against some third-party site (it  
doesn't make sense for a site to DOS itself, right?). Trying to understand  
this, here are my assumptions:

* The threat scenario is trying to DOS victim.example.com by getting a  
malicious javascript targetting this site to run on cnn.com or some  
similar high-volume site. (The attacker presumably needs to run the script  
on a high-volume site to be able to generate enough bogus requests for a  
successful DOS attack). This can be achieved for example by hacking into  
some service that delivers ads to cnn.com or in-transit modification of  
scripts requested by cnn.com (only for end users downstream of your  
network location).

* The malicious script will be using XHR in an attempt to DOS  
victim.example.com (if it uses other ways to do it, it's outside the scope  
of what we're trying to decide)

* The concern is whether allowing a custom User-Agent for XHR requests  
makes this scenario harder to defend against.

* You're saying that victim.example.com may have a white-list of  
User-Agent strings as a security measure to avoid serving content in  
response to requests presumed to be malicious, and that this helps them  
avoid XHR-based DOS attempts.

First observation is that victim.example.com needs to enable CORS for this  
attack venue to be possible in the first place. This to some extent limits  
the feasibility of the whole exercise (sites that are commonly targeted  
for DOS attacks are perhaps not likely to enable CORS - partly because it  
may make them more vulnerable to malice).

Secondly, this attempted DOS attack uses in-browser JavaScript (again, if  
it uses any other method it's outside of our scope). Out of the box, all  
the requests will be sent with the browser's original User-Agent string.  
As we're launching our attack from end users' regular web browsers, there  
is a very high chance that the User-Agent string is already on  
victim.example.com's whitelist. Hence, the DOS script will probably be  
more successful if it does *not* set User-Agent.

Why would setting User-Agent make the malicious script more effective at  
DOSing victim.example.com?

> but the use-case is really to prevent browsers from mascarading as  
> servers).

This is a way more interesting (ab)use case. You're presuming that there  
are web-exposed backend services that are configured to only talk to other  
backend servers, and use a particular magic token in User-Agent as  
authentication? If such services exist, does being able to send a  
"server-like" UA from a web browser make them significantly more  
vulnerable than being able to send the same string from a shell script?

-- 
Hallvord R. M. Steen
Core tester, Opera Software
Received on Tuesday, 9 October 2012 15:35:12 UTC