Re: CSP reports: `script-sample` from Artur Janc on 2016-10-19 (public-webappsec@w3.org from October 2016)

From: Artur Janc <aaj@google.com>
Date: Wed, 19 Oct 2016 02:16:56 +0200
To: Mike West <mkwst@google.com>
Cc: Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
Message-ID: <CAPYVjqonMsv5AKKEWaEBbevbc=0ba3XJasOtB+qpGtDJ5mGFvA@mail.gmail.com>
On Tue, Oct 18, 2016 at 10:05 AM, Mike West <mkwst@google.com> wrote:

> On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote:
>
>> On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com>
>> wrote:
>>
>>> Hey
>>>
>>> In the case of a third-party script having an error, what are example
>>> leaks you are worried about?
>>>
>>
> The same kinds of issues that lead us to sanitize script errors for things
> loaded as CORS cross-origin scripts: https://html.spec.wha
> twg.org/#muted-errors. If the resource hasn't opted-in to being
> same-origin with you, script errors leak data you wouldn't otherwise have
> access to.
>
>
>> Thanks for the summary, Mike! It's a good overview of the issue, but I'd
>> like to expand on the reasoning for why including the prefix of an inline
>> script doesn't sound particularly scary to me.
>>
>
> Thanks for fleshing out the counterpoints, Artur!
>
>
>> Basically, in order for this to be a concern, all of the following
>> conditions need to be met:
>>
>> 1. The application has to use untrusted report collection infrastructure.
>> If that is the case, the application is already leaking sensitive data from
>> page/referrer URLs to its collector.
>>
>
> "trusted" to receive URLs doesn't seem to directly equate to "trusted" to
> store sensitive data. If you're sure that you don't have sensitive data on
> your pages, great. But you were also presumably "sure" that you didn't have
> inline script on your pages, right? :)
>

Keep in mind that URLs are sensitive data for most applications and they
are currently being sent in violation reports. I'm having a difficult time
imagining a case where an application is okay with disclosing their URLs to
a third-party for the purpose of debugging violation reports, and is not
okay with disclosing script prefixes for the same purpose, given that:
1) Almost all applications have sensitive data in URLs, compared to a
certainly real, but less specific risk of having inline scripts with
sensitive data in its prefix, assuming it's limited to a reasonable length.
2) URLs are disclosed much more frequently than script samples would be,
because they are sent with every report (not just "inline" script-src
violations). In the `referrer` field, the UA is also sending a URL of
another, unrelated page, increasing the likelihood that sensitive data will
appear in the report.
3) There is no sanitization of URL parameters in violation reports,
compared to the prefixing logic we're considering for script samples.


> In fact, I'd be much more worried about URLs than script prefixes, because
>> URLs leak on *any* violation (not just for script-src) and URLs frequently
>> contain PII or authorization/capability-bearing tokens e.g for password
>> reset functionality.
>>
>
> We've talked a bit about URL leakage in https://github.com/w3c/weba
> ppsec-csp/issues/111. I recall that Emily was reluctant to apply referrer
> policy to the page's URL vis a vis the reporting endpoint, but I still
> think it might make sense.
>
>
>> 2. The application needs to have a script which includes sensitive user
>> data somewhere in the first N characters. FWIW in our small-scale analysis
>> of a few hundred thousand reports we saw ~300 inline script samples sent
>> by Firefox (with N=40) and haven't found sensitive tokens in any of the
>> snippets.
>>
>
> Yup. I'm reluctant to draw too many conclusions from that data, given the
> pretty homogeneous character of the sites we're currently applying CSP to
> at Google, but I agree with your characterization of the data.
>
> Scott might have more data from a wider sampling of sites, written by a
> wider variety of engineering teams (though it's not clear that the terms of
> that site would allow any analysis of the data).
>

I completely agree, this data is just what we had readily available -- we
can certainly do a much larger analysis of script prefixes based on the
search index. That said, if we're worried about the script-sample approach,
perhaps not seeing any sensitive data in the first dataset we looked at
could be a signal that it's worth pursuing further.

3. The offending script needs to cause a CSP violation, i.e. not have a
>> valid nonce, meaning that the application is likely broken if the policy is
>> in enforcing mode.
>>
>
> 1. Report mode exists.
>
> 2. Embedded enforcement might make it more likely that XSS on a site could
> cause policy to be inadvertantly applied to itself or its dependencies. We
> talked about this briefly last week, and I filed
> https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :)
>

Since CSPs applied by embedded enforcement serve a very different purpose
than current policies (they don't try to mitigate script injection), it
would very likely be okay to just not include script-sample data for such
policies. Also, embedded enforcement is still pretty far off, and the
reporting problem is an issue for pretty much every site currently
gathering violation reports; we should probably weigh the value of fixing
CSP reporting accordingly.


> As a security engineer, I would consider #1 to be the real security
>> boundary -- a developer should use a CSP collector she trusts because
>> otherwise, even without script-sample, reports contain data that can
>> compromise the application.
>>
>
> That sounds like an argument for reducing the amount of data in reports,
> not for increasing it. I think it's somewhat rational to believe that
> reporting endpoints are going to have longer retention times and laxer
> retention policies than application databases. Data leaking from the latter
> into the former seems like a real risk. I agree that the URL itself already
> presents risks, but I don't understand how that's a justification for
> accepting more risk.
>

It is an argument for using trusted infrastructure when building your
application ;-) Developers are already accustomed to deciding whether to
place trust in various components of their apps, whether it's the hosting
platform and OS, server-side modules and libraries, or JS widgets and other
embedded resources. A CSP violation endpoint is currently a
security-critical part of an application because it receives URLs; people
who don't trust their collection infrastructure already have insecure
applications and adding script-sample to reports does little to change
this. (Note that this wouldn't hold for applications which have nothing
sensitive in URLs and embed sensitive data at the beginning of inline
scripts, but this doesn't seem like a common pattern.)

Basically, the reluctance to include relevant debugging information in the
violation report seems to be somewhat of a misplaced concern to me, because
it ignores the trust relationship the application owner must already have
with their report collection endpoint.

Perhaps it's pertinent to take a step back and think about the reason to
have reporting functionality in CSP in the first place -- after all, the
mechanism could certainly work only via throwing SecurityPolicyViolation
events and requiring developers to write their own logging code. The fact
that this capability exists in UAs, and is not restricted to sending
reports to the same origin or same "base domain" (contrary to the original
proposals, e.g. in http://research.sidstamm.com/papers/csp-www2010.pdf)
indicates that CSP wants to be flexible and give developers ultimate
control over the reporting functionality. Given this design choice, it
seems okay to trust the developer to pick the right report URI for their
application and include useful debugging data if the developer wants it; in
a way, the status quo is the worst of both worlds, because it already
requires the developer to fully trust the collector, but doesn't give her
enough useful data to track down causes of violations.

In case it helps: Lukas ran a quick analysis of the report-uri values we've
seen in the wild, and e.g. for the domains with CSP in Alexa 100,000 we see
the following:
- 49% don't set a report-uri
- 29% have a report-uri pointing to a relative path (/foo)
- 10% have a report-uri pointing to the same origin, with another 1% using
a sibling subdomain (foo.example.org reporting to csp.example.org)

Out of the remaining ~10% which send violations to external URLs, about
half point to report-uri.io and a couple of other logging services, and the
rest seems to use another domain owned by the same person/organization,
e.g. vine.co sends reports to twitter.com. The data for all domains in our
set isn't substantially different (66% without report-uri; 24% reporting to
own domain; 10% externally). This data doesn't include all the Google
ccTLDs and a couple of other big providers, and I'm sure it's missing some
other domains, e.g. ones with CSP in parts of the site requiring
authentication, but AFAIK it shouldn't have a systematic bias otherwise.

I can easily imagine scripts that violate conditions #2 and #3, but at the
>> same time we have not seen many examples of such scripts so far, nor have
>> people complained about the script-sample data already being included by
>> Firefox (AFAIK).
>>
>
> People are generally unlikely to complain about getting more data,
> especially when the data's helpful and valuable. That can justify pretty
> much anything, though: lots of people think CORS is pretty restrictive, for
> instance, and probably wouldn't be sad if we relaxed it in various ways.
>
>
>> Overall, I don't see the gathering of script samples as qualitatively
>> different to the collection of URLs. However, if we are indeed particularly
>> worried about script snippets, we could make this opt-in and enable the
>> functionality only in the presence of a new keyword (report-uri /foo
>> 'report-script-samples') and add warnings in the spec to explain the
>> pitfalls. This way even if I'm wrong about all of the above we would not
>> expose any data from existing applications.
>>
>
> I suspect that such an option would simply be copy-pased into new
> policies, but yes, it seems like a reasonable approach.
>
>
>> For some background about why we're even talking about this: currently
>> violation reports are all but useless for both debugging and detection of
>> the exploitation of XSS due to the noise generated by browser extensions.
>>
>
> I agree that this is a problem that we should solve. One way of solving it
> is to add data to the reports. Another is to invest more in cleaning up the
> reports that you get so that there's less noise. I wish browser vendors
> (including Chrome) spent more time on the latter, as we're actively harming
> users by not doing so.
>

Yes, fixing the blocking and reporting of extension-injected scripts would
certainly help (although note that "less noise" likely isn't sufficient, it
really has to be zero noise), but IIRC prior discussions we've had about
the topic indicated that this is an almost intractable problem, so it would
be great to find alternative solutions.

The script sample approach also has several important advantages because
even without extension-related false positives, developers would have very
little information about the actual cause of inline script violations
(which are the majority of possible CSP problems in nonce-based policies).
Sending some of the script text not only makes it possible to discard all
spurious reports, but also gives the developer the crucial bit of data to
find and fix actual site errors; it seems like a workable solution to the
current reporting problems faced by many sites.

Cheers,
-Artur
Received on Wednesday, 19 October 2016 00:17:49 UTC