RE: [XHR] Open issue: allow setting User-Agent? from Jungkee Song on 2012-10-11 (public-webapps@w3.org from October to December 2012)

From: Jungkee Song <jungkee.song@samsung.com>
Date: Thu, 11 Oct 2012 17:56:53 +0900
To: "'Hallvord R. M. Steen'" <hallvord@opera.com>, 'Julian Aubourg' <j@ubourg.net>, annevankesteren@gmail.com
Cc: 'Anne van Kesteren' <annevk@annevk.nl>, public-webapps@w3.org
Message-id: <000901cda78e$60c01c90$224055b0$%song@samsung.com>
I don't think it is a right and wrong discussion. There's valid rationale for both pros and cons. 

Having mulled it over, I am leaning to not removing User-Agent from the list of prohibited headers at least in the current version. I admit that the use case is compelling to certain group of authors (mainly testing and analyzing purpose) but don't think it acquires consensus for the whole web. Besides, IMO browser spoofing either through the browser's main HTTP request or XHR request is not the ultimate way to handle the browser sniffing issues in practical service scenarios.

Jungkee

> -----Original Message-----
> From: Hallvord R. M. Steen [mailto:hallvord@opera.com]
> Sent: Wednesday, October 10, 2012 12:34 AM
> To: Julian Aubourg; annevankesteren@gmail.com
> Cc: Anne van Kesteren; Jungkee Song; public-webapps@w3.org
> Subject: Re: [XHR] Open issue: allow setting User-Agent?
> 
> Julian Aubourg <j@ubourg.net> skreiv Tue, 09 Oct 2012 16:34:08 +0200
> 
> >>> I've had trouble writing extensions and user scripts to work around
> >> backend sniffing, due to being unable to simply set User-Agent for a
> >> specific script-initiated request and get the "correct" content. As
> I've
> >> attempted to explain to Anne, I think this experience is relevant to
> >> scripts using CORS, because they also want to interact with backends
> the
> >> script author(s) don't choose or control.
> >
> >  If the backend sniffs out (all or some) browsers, it's the backend's
> > choice.
> 
> We end up in a philosophical disagreement here :-) I'd say that whatever
> browser the user decides to use is the user's choice and the server should
> respect that.
> 
> > CORS has been specified so that you NEED a cooperative backend.
> > Unlock a header and some other means to sniff you out will be found and
> > used :/
> 
> Anne van Kesteren also makes a similar point, so I'll respond to both:
> 
> > If you consider CORS you also need to consider that if we allow
> > developers to set user-agent a preflight request would be required for
> > that header (and the server would need to allow it to be custom). So
> > it's not quite that simple and would not actually help.
> 
> One word: legacy. For example Amazon.com might want to enable CORS for
> some of its content. The team that will do that won't necessarily have any
> intention of blocking browsers, but will very likely be unaware of the
> widespread browser sniffing in other parts of the Amazon backend. (With
> sites of Amazon's or eBay's scale, there is in my experience simply no
> single person who is aware of all browser detection and policies). Hence,
> there is IMO non-negligible risk that a large web service will be
> "cooperative" on CORS but still shoot itself in the foot with browser
> sniffing.
> 
> If I write, say, a CORS content aggregator, I would want it to run in all
> browsers, not only those allowed by the content providers. And I'd want to
> be in control of that. Hence, in my view this issue is mostly a trade-off
> between something script authors may need and more theoretical purity
> concerns.
> 
> >>> The changed User-Agent will of course only be sent with the requests
> >> initiated by the script, all other requests sent from the browser will
> >> be normal. Hence, the information loss will IMO be minimal and probably
> >> have no real-world impact on browser stats.
> 
> > var XHR = window.XMLHttpRequest;
> >
> > window.XMLHttpRequest = function() {
> >    var xhr = new XHR(),
> >        send = xhr.send;
> >    xhr.send = function() {
> >        xhr.setRequestHeader( "User-Agent", "OHHAI!" );
> >        return send.apply( this, arguments );
> >    };
> >    return xhr;
> > };
> 
> Yes, this could give a generic library like jQuery less control of the
> contents of *its* request. However, there will still be plenty of requests
> not sent through XHR - the browser's main GET or POST for the actual page
> contents, all external files loaded with SCRIPT, LINK, IMG, IFRAME, EMBED
> or OBJECT, all images from CSS styling etc. Hence I still believe the
> information loss and effect on stats will be minimal.
> 
> Also, the above could be a feature if I'm working on extending a site
> where I don't actually fully control the backend - think a CMS I'm forced
> to use and have to work around bugs in even if that means messing with how
> jQuery sends its requests ;-).
> 
> >>> If your backend really relies on User-Agent header values to avoid
> >>> being
> >> "tricked" into malicious operations you should take your site offline
> >> for a
> >> while and fix that ;-). Any malicious Perl/PHP/Ruby/Shell script a
> >> hacker
> >> or script kiddie might try to use against your site can already fake
> >> User-Agent
> >
> >
> > Oh, I agree entirely. Except checking User-Agent is a quick and painless
> > means to protect against malicious JavaScript scripts. I don't like the
> > approach more than you do, but we both know it's used in the wild.
> 
> I'm afraid I don't know how this is used in the wild and don't fully
> understand your concerns. Unless you mean we should protect dodgy SEO
> tactics sending full site contents to Google bot UAs but a paywall block
> to anyone else from user-applied scripts trying to work around that?
> 
> >> A malicious ad script would presumably currently have the user's web
> >> browser's User-Agent sent with any requests it would make
> 
> > The malicious script can trick the server into accepting a request the
> > backend expects to be able to filter out by checking a header which the
> > standard says is set by the browser and cannot be changed by user
> > scripts.
> > Think painless DOS with a simple piece of javascript.
> 
> I still don't fully understand the scenario(s) you have in mind.
> 
> For a DOS attack you'd be launching it against some third-party site (it
> doesn't make sense for a site to DOS itself, right?). Trying to understand
> this, here are my assumptions:
> 
> * The threat scenario is trying to DOS victim.example.com by getting a
> malicious javascript targetting this site to run on cnn.com or some
> similar high-volume site. (The attacker presumably needs to run the script
> on a high-volume site to be able to generate enough bogus requests for a
> successful DOS attack). This can be achieved for example by hacking into
> some service that delivers ads to cnn.com or in-transit modification of
> scripts requested by cnn.com (only for end users downstream of your
> network location).
> 
> * The malicious script will be using XHR in an attempt to DOS
> victim.example.com (if it uses other ways to do it, it's outside the scope
> of what we're trying to decide)
> 
> * The concern is whether allowing a custom User-Agent for XHR requests
> makes this scenario harder to defend against.
> 
> * You're saying that victim.example.com may have a white-list of
> User-Agent strings as a security measure to avoid serving content in
> response to requests presumed to be malicious, and that this helps them
> avoid XHR-based DOS attempts.
> 
> First observation is that victim.example.com needs to enable CORS for this
> attack venue to be possible in the first place. This to some extent limits
> the feasibility of the whole exercise (sites that are commonly targeted
> for DOS attacks are perhaps not likely to enable CORS - partly because it
> may make them more vulnerable to malice).
> 
> Secondly, this attempted DOS attack uses in-browser JavaScript (again, if
> it uses any other method it's outside of our scope). Out of the box, all
> the requests will be sent with the browser's original User-Agent string.
> As we're launching our attack from end users' regular web browsers, there
> is a very high chance that the User-Agent string is already on
> victim.example.com's whitelist. Hence, the DOS script will probably be
> more successful if it does *not* set User-Agent.
> 
> Why would setting User-Agent make the malicious script more effective at
> DOSing victim.example.com?
> 
> > but the use-case is really to prevent browsers from mascarading as
> > servers).
> 
> This is a way more interesting (ab)use case. You're presuming that there
> are web-exposed backend services that are configured to only talk to other
> backend servers, and use a particular magic token in User-Agent as
> authentication? If such services exist, does being able to send a
> "server-like" UA from a web browser make them significantly more
> vulnerable than being able to send the same string from a shell script?
> 
> --
> Hallvord R. M. Steen
> Core tester, Opera Software
Received on Thursday, 11 October 2012 08:57:33 UTC