Re: Early morning thoughts on referrers. from Brian Smith on 2014-11-10 (public-webappsec@w3.org from November 2014)

From: Brian Smith <brian@briansmith.org>
Date: Sun, 9 Nov 2014 22:57:22 -0800
To: Mike West <mkwst@google.com>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>, Jochen Eisinger <eisinger@google.com>, Devdatta Akhawe <dev.akhawe@gmail.com>
Message-ID: <CAFewVt5Gyb7N7XbWD8D1Tcz5xMn9RkfjWSRFPSURLgg2w9uxLQ@mail.gmail.com>
On Sun, Nov 9, 2014 at 9:10 PM, Mike West <mkwst@google.com> wrote:
> 3. There's a lack of granularity in the available policies, as they describe
> specific points on a matrix consisting of the referrer's granularity
> (origin? path? query string?) and the relationship between the referrer and
> referree (same-origin? same-public-suffix? cross-public-suffix? downgrade
> from HTTPS to HTTP?).

There's another dimension regarding the relationship between the
referrer and the referree: whether the HTTP request is being done to
load a subresource, or whether it is a navigation. I believe that
"unsafe-url" was added in an attempt to address a perceived need for
navigation, but  it also affects subresource requests, even though
nobody is arguing unsafe-url is a good idea for subresource requests
(i.e. mixed content).

> 3. Let's extend the policy definition language to treat the current set of
> policies as shorthand for a more detailed language.

I think that for the purpose of understanding the dimensions of
understanding the various ways referrers could work, what you're
suggesting is useful. I am hopeful that we could use that language as
a modeling language for discussion, and then end up with something
useful that goes into the actual policy specification mechanism.

> While on the topic (tangentially), I think it's a perfectly reasonable idea
> to allow a site to specify a same-origin "base referrer url", and to use
> that as the basis upon which to perform the origin/whatever stripping
> operations. I don't think that's at all an essential component, however, and
> I'd like to work everything else out first. :)

To clarify what I said in response to Devdatta, I also think it is
reasonable in theory, but I think in practice the use cases for it
might be served well enough by other means, and I think the
cost/benefit could be tough to justify.

> 2. I believe the risk of specifying something that some folks don't like and
> want to experiment with killing is is already sufficiently mitigated via
> language that allows user agents to treat it as a maximal limit on referrer
> information, rather than a mandate.

I appreciate that there was consideration of the desires of some
people to restrict what information is leaked from their products.
However, this type of specification is not, AFAICT, compatible with
the WHATWG philosophy of specifying features. In particular, WHATWG
highly values precise, consistent specification over allowing
implementations to give different observable results. The rationale
for the WHATWG approach is based on the experience of browser vendors
with minority market share being forced to do things the way the
browser with dominant market share did things, for compatibility
reasons, even if the specification technically offered alternatives.

Further to this point, it is especially important for the minority
market share browsers, and particularly the browsers who aren't first
to implement a feature, to insist that rules like the "two
implementations required" (and not just two "says they will probably
implement eventually") and "all cases are precisely specified" are
followed. These guidelines are there partially to protect everybody
into being cornered into blindly copying whatever the leader did. In
particular, it is dangerous to read a spec and then say "it looks OK
to me; go ahead and ship it" without trying to implement it, because
you're not going to foresee all the problems you will run into if/when
you actually get around to implementing it.

We've enjoyed the success of the WHATWG approach for so long that I
can see that it is easy to forgot some of the fundamental things that
has made WHATWG work so well. Probably the clearest and best example I
can think of, regarding the WHATWG-style approach that I'm arguing for
here, is the way the HTML5 parsing algorithm is specified. The whole
idea of its specification is that you can feed it ANY sequence of
bytes, and using that algorithm you will construct an identical parse
tree with any compliant implementation--without any variance across
compliant implementations.

> I also think it's important to allow
> sites to opt-into behavior they've come to expect from the web when
> migrating from HTTP to HTTPS; regardless of that behavior's awesomeness, I'd
> like to remove as many barriers to that transition as possible.

I also mostly agree that browsers likely shouldn't cap the information
that a site can opt into for HTTPS -> HTTP navigation to be less than
the referrer information available in HTTP->HTTP or HTTPS->HTTPS
navigations. However, I disagree that 'unsafe-url' is the best way to
do that. I think, instead, browsers should try to cap Referrer for
HTTP->HTTP and HTTPS->HTTPS navigations to 'origin' for
cross-origin/cross-suffix navigations, and then maybe offer websites
the ability to opt into sharing 'origin' for HTTPS->HTTP navigations.

I view adding 'unsafe-url' as a race to the bottom, whereas I view
restricting the maximum amount of referrer information for all
cross-origin/cross-suffix navigations to be a race to the top. But, we
share the same goal of minimizing disincentives for sites to switch to
HTTPS.

> 1. Making the policy language more granular might allow us to more clearly
> define what sorts of things are acceptable in <meta>. It does not, of
> course, address the core of Brian and other's comments about the complexity;
> it makes things more complex, honestly. But that complexity might bring with
> it a bit more clarity, as we won't be hiding behaviors behind opaque policy
> names.

I think we should identity the various combinations that are possible
in theory, then prune those down to the minimum set of combinations
that we think we can get away with offering, and then figure out a
syntax that allows for the specification of those possibilities.

Also, to be clear about my arguments against CSP referrer, there are
two main ones:

1. Removing referrer and reflected-xss leaves CSP 2.0 with the same
kind of clean, easily-understood, simply-composable semantics that CSP
1.0 had, and it would be a shame to lose that without a *really* good
reason, because I think the safety of composing policies will be a key
to having more frameworks provide automation for defining and
maintaining CSP policies in a safe way. I don't think referrer and
reflected-xss, the way they are defined in the current draft, is a
good enough reason to lose that. But, I'm not against having some
mechanism outside of the Content-Security-Policy header for each of
these. X-XSS-Protection already does exactly the same thing as
reflected-xss for the browsers that need such a thing, so referrer is
the only thing that would need a new thing.

2. This conversation hasn't happened yet, and without completing this
conversation, it was/is too early to define a syntax for referrer
policies.

Cheers,
Brian
Received on Monday, 10 November 2014 06:57:49 UTC