Re: The HTTP Origin Header (draft-abarth-origin) from Adam Barth on 2009-01-25 (ietf-http-wg@w3.org from January to March 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Sun, 25 Jan 2009 11:31:12 -0800
To: Mark Nottingham <mnot@mnot.net>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter <LMM@acm.org>, ietf-http-wg@w3.org, Lisa Dusseault <ldusseault@commerce.net>
Message-ID: <7789133a0901251131v6aae7d04ka40f6c5685cbe63@mail.gmail.com>
On Sat, Jan 24, 2009 at 8:30 PM, Mark Nottingham <mnot@mnot.net> wrote:
> I'd like to dig into that. You believe that most of the suppression of the
> Referer header is done in proxies, due to the differences seen in HTTP and
> HTTPS.

That and the low suppression rates of document.referrer (see Figure 3
of http://www.adambarth.com/papers/2008/barth-jackson-mitchell-b.pdf).

> However, there are also considerable differences between the block
> rates for same-domain vs. cross-domain requests; are you implying that these
> proxies are parsing the Referer and only blocking those that are
> cross-domain?

That's what appears to be going on.

> If so, this seems an odd rationale; a person or company
> blocking referers for purpose of privacy would presumably be doing so for
> all values, not just cross-domain referers.

Same-domain Referer headers do raise a privacy concern because the
server already knows which URL the user requested previously.
Cross-domain Referer headers can inform an entirely different entity
of what page you were just viewing.

> Likewise, someone doing it to
> hide intranet URLs would be more likely to only hide those, rather than to
> stop cross-domain referers.

We were unable to measure this in our experiment because we ran our
experiment on the Internet.

> Additionally, discriminating requests as cross-domain is more expensive to
> implement in an intermediary, and these implementers are famously sensitive
> to performance issues.

This can be done with a simple regular expression by matching the Host
header with the Referer header.

> All of the products that I'm aware of would easily
> allow wholesale blocking of a header, but would require a relatively
> expensive (and thereby less likely) callout (e.g., with ICAP) to selectively
> block them based upon request state.

Do you have an alternate explanation for the data we observe?

> On the other hand, I do notice that Firefox has the ability to selectively
> configure how Referer headers are blocked, both in terms of same-site vs.
> cross-site and HTTP vs. HTTPS;
>  http://kb.mozillazine.org/Network.http.sendRefererHeader

This preference blocks both the Referer header and and the
document.referrer property (see documentation at
<http://kb.mozillazine.org/Network.http.sendRefererHeader>).  Our data
indicates that a vanishing number of users enable this preference.

>  http://kb.mozillazine.org/Network.http.sendSecureXSiteReferrer

This blocks cross-domain Referer headers when both domains are using
HTTPS.  Our data indicates that virtually no one enables this
preference.

> Couldn't that account for at least a portion of the discrepancies you saw?

No, for the reasons stated above.

> BTW, did you look for vanilla wafers
> <http://www.junkbusters.com/ijbfaq.html#wafers> to see how much of this
> stripping could be attributed to JunkBuster?

Unfortunately, we did not.  If we'd known about them at the time, then
we would have.  As far as I can tell from the JunkBuster documentation
at <http://www.junkbusters.com/ijbman.html>, they block the Referer
header for both same-domain and cross-domain requests.

> Also, did you find any rationale for the difference between rates seen on
> network A vs. network B? It's a pretty wide range...

This is a bit of a puzzle.  I suspect these network are targeting
different demographics and the truth lies somewhere in the middle.

> The numbers that I found especially interesting were for stripping of
> same-site XmlHttpRequest-generated Referer headers, which came in at
> (eyeballing Figure 3) about 0.6% on HTTP and 0.2% on HTTPS (discounting the
> Firefox 1.x bug, which isn't relevant to this discussion, since we're
> talking about updated browsers as a pre-condition). Aren't these numbers
> closer to what one would expect?

I'm not surprised by a 3% suppression rate given the feedback from Web
sites that try strict Referer validation.  Note that the same-domain
suppression rate of the Referer header is much larger, around 6%, on
network B.

> In particular, they're much closer to the
> numbers for "custom" headers that you measure, which means we are looking at
> implementations that white-list as a significant factor (as well as
> statistical error, of course)...

The "suppression" of custom headers appears to be mostly due to
oddball user agents that don't have fully implemented XMLHttpRequest
objects.  Looking at events that occur less than 0.2% of the time puts
you WAY out in the tail of user agents.

Statistical error is not much of a factor, given the enormous sample
size.  Sampling bias is a concern, however, which is why we tried two
sampling techniques.

> Lastly -- Figure 3 says that its unit is requests; wouldn't IP addresses be
> a more useful number here? Or, better yet, unique clients (tracked with a
> cookie)?  Otherwise it seems that the results could be skewed by, for
> example, a single very active proxy.

We ran the numbers all three ways and they're similar.  Tracking
unique users with cookies is a bit unreliable due to third-party
cookie blocking in some user agents.

> Likewise, did you record the
> geographical distribution of clients? It would be nice to have assurances
> that this sample represents a global audience, and not just a selective
> (read: US) one.

Our add campaigns targeted the US.  We were unable to record IP
addresses due to ethical concerns.  We did record a keyed hash of the
IP addresses to re-identify requests from the same IP address, but we
deleted the key after the experiment to prevent further
re-identification.

>> Unfortunately, these proxies prevent Web sites from relying on the
>> Referer header, and so the operators of these proxies never come under
>> pressure to stop suppressing the header.
>
> Certainly they do. If Cool New Cross-Site Web Apps are broken, and they
> explain to the user why it is broken, both ISPs and companies will come
> under pressure.

Unfortunately, folks developing Cool New Cross-Site Web Apps know that
the Referer is often suppressed and therefore do not rely on the
header in their designs.  Thus, step 1 never occurs.

> IMO 99% of the driving factor for deployment here is going to be new
> features -- supporting cross-site XmlHttpRequest with authentication, etc.

Great.  A number of browser vendors are interested in implementing the
header, giving their users yet more reasons to upgrade.

> Well, we'd be in the same situation as today; a current (non-Origin) browser
> would be able to make cross-site requests (using IMG, form.submit, etc.).

The Origin header is incrementally useful as a CSRF defense.  Users
with supporting user agents will benefit.  Users without supporting
user agents will be no worse off than they are today.  This is
different than the situation we are in today because sites must
engineer complex CSRF defense to help any of their users.  The Origin
header lets sits protect some of their users with minimal effort.

Adam
Received on Sunday, 25 January 2009 19:31:49 UTC