Re: [REFERRER][CSP] Improving the Web Platform's Referrer Policy

On Wed, Dec 3, 2014 at 10:23 PM, Brian Smith <brian@briansmith.org> wrote:

> Hi,
>
> I've now written down my ideas on how to improve browsers' handling of
> the Referer header field:
>
>     https://briansmith.org/referrer-01.html


Thanks for putting this together, Brian! It's a well thought-out plan, and
I think there are a lot of good ideas there to dig through.

That said, I've coincidentally been chatting with some folks inside Google
about a similar proposal since early November (see
chrome://flags/#reduced-referrer-granularity for the public bits of that
work). My TL;DR is that the proposal as written would be quite harmful to
the way folks make money on the web. That's a strong claim, so I'll
elaborate a bit. :)

# Advertising + Full URL for subresource requests

Your proposal simplifies the model by limiting cross-origin subresource
referrer information to 'origin' (or 'none') by default, and disallowing an
opt-in above that default. I don't believe that's feasible in the
short-to-medium term.

The central issue that I've heard internally is that our ads teams have a
hard requirement to know the URL of the page embedding an advertisement. We
certainly use this information for content targeting and building user
profiles, and dropping it would cost some amount of money. However, the
most critical driver of this requirement actually comes from policy
enforcement: we must not serve ads on certain kinds of pages (see [1
<https://support.google.com/adsense/answer/1348688?hl=en&topic=1271507&rd=1>]
for examples), and if we can't crawl the page to determine its content, we
can't make these policy guarantees. Aside from just being a bad thing to
do, there are regulatory implications to lapses here.

I talked with folks about alternative mechanisms for transmitting the URL
(e.g. GET parameters, postMessage, etc). I think that's an area still worth
exploring, but two aspects were raised as concerning:

* URLs are limited to ~2k, and we already see truncation on a semi-regular
basis (in the low hundreds of millions of requests a day). Cramming an
encoded URL into a GET parameter would bring us above that limit more
frequently.

* For a variety of reasons, JavaScript-driven embedding is significantly
less reliable than parser-driven embedding. I heard claims of >1% loss
between a JS-requested image, and a plain <img> tag (I'm following up on
those claims for details).

The conclusion internally is that any change to the defaults would need a
mechanism for publishers to opt-into the status quo for subresource
requests (and would result in many/most doing so).

# Navigational requests

The proposal blocks cross-origin navigational referrer information entirely
(with a carveout for HTTPS->HTTPS opt-in to 'origin'). A similar aspect of
my internal proposal worried our fraud detection team, who often use
navigational referrer information to discover and analyze some groups of
scammers. Dropping that data would cause some amount of financial impact.

# Compatibility

In your proposal, you mention sites that use referrer information as part
of an access control scheme. Folks internally raised similar concerns. It's
difficult to estimate what percentage of sites would be affected, but it's
certainly >0; it's not clear to me how your proposal addresses those
concerns.

# Analytics

Obviously, this would have a substantial impact on folks' ability to
analyze traffic patterns incoming to their sites. Equally obviously, this
is a privacy impact we'd like to address. It seems like there's a balance
to be struck, and I'd suggest that 'origin' is closer than 'none' to a
reasonable position.

# Anyway:

I very much like the notion of separating subresource and navigational
referrer information controls. We should do that.

I also very much like the idea of separating a page-level policy out from a
policy for specific requests/links. We should explore that.
`rel="whatever-referrer"` makes sense to me for links, I think a similar
attribute would be valuable for other request-generating tags, like
<iframe>, <form>, etc.

Thanks again for putting this together! +Jochen, who might or might not
agree with anything I've said above. :)

[1]:
https://support.google.com/adsense/answer/1348688?hl=en&topic=1271507&rd=1

-mike

--
Mike West <mkwst@google.com>
Google+: https://mkw.st/+, Twitter: @mikewest, Cell: +49 162 10 255 91

Google Germany GmbH, Dienerstrasse 12, 80331 München, Germany
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
(Sorry; I'm legally required to add this exciting detail to emails. Bleh.)

Received on Tuesday, 9 December 2014 11:35:22 UTC