RE: [REFERRER][CSP] Improving the Web Platform's Referrer Policy

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maybe this is another case where client specific state could be used to offer more control over security/privacy to the user. Cross-origin subresource referrer information could default to ‘origin’ ( & cross-origin navigation to “none”) but when the request header DNT was ‘0’ it could revert to the Entire URL. The navigation-referrer and subresource-referrer CSP directives (and the link relations) would restrict the DNT:0 case back to ‘origin’.

A milder form of it  could be to default to Entire URL when DNT unset or DNT:0 and to ‘origin’ (or ‘none’) when DNT was 1.



From: Mike West [mailto:mkwst@google.com]
Sent: 09 December 2014 11:35
To: Brian Smith
Cc: public-webappsec@w3.org; Jochen Eisinger
Subject: Re: [REFERRER][CSP] Improving the Web Platform's Referrer Policy

On Wed, Dec 3, 2014 at 10:23 PM, Brian Smith <brian@briansmith.org> wrote:
Hi,

I've now written down my ideas on how to improve browsers' handling of
the Referer header field:

https://briansmith.org/referrer-01.html

Thanks for putting this together, Brian! It's a well thought-out plan, and I think there are a lot of good ideas there to dig through.

That said, I've coincidentally been chatting with some folks inside Google about a similar proposal since early November (see chrome://flags/#reduced-referrer-granularity for the public bits of that work). My TL;DR is that the proposal as written would be quite harmful to the way folks make money on the web. That's a strong claim, so I'll elaborate a bit. :)

# Advertising + Full URL for subresource requests

Your proposal simplifies the model by limiting cross-origin subresource referrer information to 'origin' (or 'none') by default, and disallowing an opt-in above that default. I don't believe that's feasible in the short-to-medium term.

The central issue that I've heard internally is that our ads teams have a hard requirement to know the URL of the page embedding an advertisement. We certainly use this information for content targeting and building user profiles, and dropping it would cost some amount of money. However, the most critical driver of this requirement actually comes from policy enforcement: we must not serve ads on certain kinds of pages (see [1] for examples), and if we can't crawl the page to determine its content, we can't make these policy guarantees. Aside from just being a bad thing to do, there are regulatory implications to lapses here.

I talked with folks about alternative mechanisms for transmitting the URL (e.g. GET parameters, postMessage, etc). I think that's an area still worth exploring, but two aspects were raised as concerning:

* URLs are limited to ~2k, and we already see truncation on a semi-regular basis (in the low hundreds of millions of requests a day). Cramming an encoded URL into a GET parameter would bring us above that limit more frequently.

* For a variety of reasons, JavaScript-driven embedding is significantly less reliable than parser-driven embedding. I heard claims of >1% loss between a JS-requested image, and a plain <img> tag (I'm following up on those claims for details).

The conclusion internally is that any change to the defaults would need a mechanism for publishers to opt-into the status quo for subresource requests (and would result in many/most doing so).

# Navigational requests

The proposal blocks cross-origin navigational referrer information entirely (with a carveout for HTTPS->HTTPS opt-in to 'origin'). A similar aspect of my internal proposal worried our fraud detection team, who often use navigational referrer information to discover and analyze some groups of scammers. Dropping that data would cause some amount of financial impact.

# Compatibility

In your proposal, you mention sites that use referrer information as part of an access control scheme. Folks internally raised similar concerns. It's difficult to estimate what percentage of sites would be affected, but it's certainly >0; it's not clear to me how your proposal addresses those concerns.

# Analytics

Obviously, this would have a substantial impact on folks' ability to analyze traffic patterns incoming to their sites. Equally obviously, this is a privacy impact we'd like to address. It seems like there's a balance to be struck, and I'd suggest that 'origin' is closer than 'none' to a reasonable position.

# Anyway:

I very much like the notion of separating subresource and navigational referrer information controls. We should do that.

I also very much like the idea of separating a page-level policy out from a policy for specific requests/links. We should explore that. `rel="whatever-referrer"` makes sense to me for links, I think a similar attribute would be valuable for other request-generating tags, like <iframe>, <form>, etc.

Thanks again for putting this together! +Jochen, who might or might not agree with anything I've said above. :)

[1]: https://support.google.com/adsense/answer/1348688?hl=en&topic=1271507&rd=1

- -mike

- --
Mike West <mkwst@google.com>
Google+: https://mkw.st/+, Twitter: @mikewest, Cell: +49 162 10 255 91
Google Germany GmbH, Dienerstrasse 12, 80331 München, Germany
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
(Sorry; I'm legally required to add this exciting detail to emails. Bleh.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (MingW32)
Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
Charset: utf-8

iQEcBAEBAgAGBQJUixVQAAoJEHMxUy4uXm2J7DIH/jjQcGSglH2Rvfea6g54TJRu
/+LADrbO9LyZlzoIC7ILUVri4sPTcqSPR249JFvjdSFWc3pJ89LMglrPPXQs7P89
JYXxWI9yxsJGUjI7WJQCZYhCWzKvPuQmM3MMoaZMHoSPU+lcfntw7ASwYDymg7jp
K3eDppGDfmq8K6JlnhpqgI1QzbznkMbUhTnFKTMv3BfhVsmf+v5+UcfPN8ftmzuy
rnj+BZIepjt7NgJJr0+ilSkSawm+Uk1O3RMifz0MIzyxKZ2j61e1cU7n09E5m0MB
69/AKcIsfcnR3LNji7xrunyvsOBtZvG4Osn7dL6a7ZoS0gOTSuo6ytvbbYC51fM=
=KGDu
-----END PGP SIGNATURE-----

Received on Friday, 12 December 2014 16:19:17 UTC