Re: CSP reports: `script-sample` from Mike West on 2016-10-18 (public-webappsec@w3.org from October 2016)

From: Mike West <mkwst@google.com>
Date: Tue, 18 Oct 2016 10:05:23 +0200
To: Artur Janc <aaj@google.com>
Cc: Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
Message-ID: <CAKXHy=f30cUWV1QOFO2VBN3yUUSf+3s6eX9647GR9f4oBtgwZw@mail.gmail.com>

On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote:

> On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com>
> wrote:
>
>> Hey
>>
>> In the case of a third-party script having an error, what are example
>> leaks you are worried about?
>>
>
The same kinds of issues that lead us to sanitize script errors for things
loaded as CORS cross-origin scripts:
https://html.spec.whatwg.org/#muted-errors. If the resource hasn't opted-in
to being same-origin with you, script errors leak data you wouldn't
otherwise have access to.

> Thanks for the summary, Mike! It's a good overview of the issue, but I'd
> like to expand on the reasoning for why including the prefix of an inline
> script doesn't sound particularly scary to me.
>

Thanks for fleshing out the counterpoints, Artur!

> Basically, in order for this to be a concern, all of the following
> conditions need to be met:
>
> 1. The application has to use untrusted report collection infrastructure.
> If that is the case, the application is already leaking sensitive data from
> page/referrer URLs to its collector.
>

"trusted" to receive URLs doesn't seem to directly equate to "trusted" to
store sensitive data. If you're sure that you don't have sensitive data on
your pages, great. But you were also presumably "sure" that you didn't have
inline script on your pages, right? :)

> In fact, I'd be much more worried about URLs than script prefixes, because
> URLs leak on *any* violation (not just for script-src) and URLs frequently
> contain PII or authorization/capability-bearing tokens e.g for password
> reset functionality.
>

We've talked a bit about URL leakage in
https://github.com/w3c/webappsec-csp/issues/111. I recall that Emily was
reluctant to apply referrer policy to the page's URL vis a vis the
reporting endpoint, but I still think it might make sense.

> 2. The application needs to have a script which includes sensitive user
> data somewhere in the first N characters. FWIW in our small-scale analysis
> of a few hundred thousand reports we saw ~300 inline script samples sent
> by Firefox (with N=40) and haven't found sensitive tokens in any of the
> snippets.
>

Yup. I'm reluctant to draw too many conclusions from that data, given the
pretty homogeneous character of the sites we're currently applying CSP to
at Google, but I agree with your characterization of the data.

Scott might have more data from a wider sampling of sites, written by a
wider variety of engineering teams (though it's not clear that the terms of
that site would allow any analysis of the data).

> 3. The offending script needs to cause a CSP violation, i.e. not have a
> valid nonce, meaning that the application is likely broken if the policy is
> in enforcing mode.
>

1. Report mode exists.

2. Embedded enforcement might make it more likely that XSS on a site could
cause policy to be inadvertantly applied to itself or its dependencies. We
talked about this briefly last week, and I filed
https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :)

> As a security engineer, I would consider #1 to be the real security
> boundary -- a developer should use a CSP collector she trusts because
> otherwise, even without script-sample, reports contain data that can
> compromise the application.
>

That sounds like an argument for reducing the amount of data in reports,
not for increasing it. I think it's somewhat rational to believe that
reporting endpoints are going to have longer retention times and laxer
retention policies than application databases. Data leaking from the latter
into the former seems like a real risk. I agree that the URL itself already
presents risks, but I don't understand how that's a justification for
accepting more risk.

I can easily imagine scripts that violate conditions #2 and #3, but at the
> same time we have not seen many examples of such scripts so far, nor have
> people complained about the script-sample data already being included by
> Firefox (AFAIK).
>

People are generally unlikely to complain about getting more data,
especially when the data's helpful and valuable. That can justify pretty
much anything, though: lots of people think CORS is pretty restrictive, for
instance, and probably wouldn't be sad if we relaxed it in various ways.

> Overall, I don't see the gathering of script samples as qualitatively
> different to the collection of URLs. However, if we are indeed particularly
> worried about script snippets, we could make this opt-in and enable the
> functionality only in the presence of a new keyword (report-uri /foo
> 'report-script-samples') and add warnings in the spec to explain the
> pitfalls. This way even if I'm wrong about all of the above we would not
> expose any data from existing applications.
>

I suspect that such an option would simply be copy-pased into new policies,
but yes, it seems like a reasonable approach.

> For some background about why we're even talking about this: currently
> violation reports are all but useless for both debugging and detection of
> the exploitation of XSS due to the noise generated by browser extensions.
>

I agree that this is a problem that we should solve. One way of solving it
is to add data to the reports. Another is to invest more in cleaning up the
reports that you get so that there's less noise. I wish browser vendors
(including Chrome) spent more time on the latter, as we're actively harming
users by not doing so.

-mike

Received on Tuesday, 18 October 2016 08:06:18 UTC