- From: Mike West <mkwst@google.com>
- Date: Wed, 22 Feb 2017 15:23:29 +0100
- To: Neil Matatall <oreoshake@github.com>
- Cc: Artur Janc <aaj@google.com>, Brad Hill <hillbrad@gmail.com>, Craig Francis <craig.francis@gmail.com>, Krzysztof Kotowicz <kkotowicz@gmail.com>, Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
- Message-ID: <CAKXHy=eA80TQk7bFtyOrz5Gw2cbF34OzCfaEZOxZ=tAT2UHvRA@mail.gmail.com>
I'd suggest that interested folks comment on https://github.com/w3c/webappsec-csp/issues/119. I've added a strawman to the spec which allows a policy to opt-in to delivering a `sample` attribute for inline violations iff a `'report-sample'` expression is present in the relevant `script-src` or `style-src` directive. Feedback welcome. -mike On Tue, Feb 21, 2017 at 8:39 PM, Neil Matatall <oreoshake@github.com> wrote: > :wave: hello again friends, > > I've been snoozing this thread for almost 6 months and I'd like to > resurrect this conversation after a recent twitter flurry of action and +1s > (https://twitter.com/mikewest/status/834081437473185792). I don't think > anyone has called out _why_ we need this other than "moar data" but I think > it's important to think about two use cases: report filtering and actually > capturing attack payloads. 5 years ago (https://lists.w3.org/ > Archives/Public/public-webappsec/2012Dec/0012.html) I thought > script-sample was for capturing attack payloads. Today, I think > script-sample is more important for report filtering. > > 2.5 years ago the topic of "inline and eval reports look the same" was > discussed (https://github.com/w3c/webappsec/issues/52). This is a serious > problem for analyzing reports. > > .... slight tangent on why filtering is so important > > CSP reporting without script-sample is not very useful for filtering > garbage from plugins. Having script-sample data from firefox has been the > basis of filtering garbage. Without this data, we cannot filter that > specific type of garbage. Filtering out garbage is critical for building > the case for going from report-only to enforce mode. This logic is spread > out across the internet, e.g. https://oreoshake.github.io/ > csp/twitter/2014/07/25/twitters-csp-report-collector-design.html, > https://blogs.dropbox.com/tech/2015/09/on-csp-reporting-and-filtering/, > etc. Funny enough, superfish.com was a standard "un-actionable report" > filter that has been passed down :) For every blog post on CSP reporting, > the conversation always devolves to one thing: how are you filtering > reports (https://mathiasbynens.be/notes/csp-reports). For every CSP > reporting collector, filtering is reimplemented ( https://github.com/ > jacobbednarz/go-csp-collector/blob/master/csp_collector.go#L58 and > https://github.com/nico3333fr/CSP-useful/blob/ > 61106b31683928a0f3dde3d312eaa257d0740914/report-uri/csp- > parser-enhanced.php#L17 ). Sentry even added CSP report collection, and > guess what, they also do filtering https://github.com/ > getsentry/sentry/commit/0b9e124b702183a70002635ffd252e27d11fbe97 . > > Scott Helme says he filters out 64% of his reports (though this number > includes things that are not csp reports). That's about what I can recall > from days where I had to collect reports. With script-sample from all > browsers, I suspect that number will jump up significantly since it will be > easier to identify plugins/other garbage. > > ... my point is > > Most people implementing CSP expect the reports to be useful. Alas, they > are not without some serious filtering. Many are turned away by the high > quantity of "un-actionable" reports. While I'd be super happy if browser > vendors came up with magic to make filtering obsolete (all have acknowledge > plugin noise is a bug), I'd be happier if we could just do better filtering > today. Script sample helps me accomplish better filtering. > > ... but > > Data leakage. This is not to be taken lightly. I think everything that can > be said about this already has. I also anecdotally agree that we're more > likely to leak something in an URL than in a script sample. The scripts > samples were crucial to filtering out lastpass violations which were easily > identifiable. Adding an opt-in flag while keeping the 40 character limit > would be great. Requiring TLD+1 matching would also be acceptable in my > mind. > > JSONP. Ugh. It's still pretty widespread and it's unfortunate that the > internet is still terrible. I'm all for not breaking the web, but I'm more > for not being held back. > > > > > > > > On Wed, Oct 19, 2016 at 9:49 AM Artur Janc <aaj@google.com> wrote: > >> On Wed, Oct 19, 2016 at 7:14 PM, Brad Hill <hillbrad@gmail.com> wrote: >> >> Just to add my comment after discussion on today's teleconference: >> >> I'm sympathetic to the argument that "you must trust your reporting >> endpoint" for inline scripts and event handlers. >> >> I'm concerned about it for non-same origin external scripts. There are >> lots of "guarded" JSONP like endpoints that I imagine have PII in the first >> 40 chars, e.g., something like the following where u is a userid: >> >> for(;;);{u:21482823, ... >> >> Allowing any cross-origin endpoint to request this response and dump it >> to a reporting endpoint of their choice would be very bad. >> >> >> Yes, totally! What we were talking about earlier is reporting script >> samples for "inline" script violations such as event handlers and inline >> <script> blocks -- this is data which the developer can already access by >> inspecting the DOM (though in some cases it can be difficult if the element >> has already been removed). The main benefit of having it done by the user >> agent is robustness and not requiring the developer to load a "CSP >> debugging" library everywhere on their site. >> >> FWIW I'd be strongly opposed to doing this for external scripts because >> of the problem you're talking about. Luckily, we generally don't need this >> for external scripts because they can already be identified by the >> blocked-uri, unlike inline violations. >> >> >> -Brad >> >> On Wed, Oct 19, 2016 at 8:21 AM Craig Francis <craig.francis@gmail.com> >> wrote: >> >> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote: >> >> URLs should not contain sensitive data, precisely because they are often >> relayed to third parties >> >> >> >> Hi Krzysztof, >> >> While I agree, the classic example is a password reset. >> >> As in, a link is sent to the user via email, and that link contains a >> sensitive token that allows someone to change the account password. >> >> That said, the token should not last long to deduce the risk of it being >> exposed to 3rd parties (e.g. expiring it after use, or a certain period of >> time). >> >> Craig >> >> >> >> >> >> >> On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote: >> >> >> >> 2016-10-19 2:16 GMT+02:00 Artur Janc <aaj@google.com>: >> >> On Tue, Oct 18, 2016 at 10:05 AM, Mike West <mkwst@google.com> wrote: >> >> On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote: >> >> On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com> >> wrote: >> >> Hey >> >> In the case of a third-party script having an error, what are example >> leaks you are worried about? >> >> >> The same kinds of issues that lead us to sanitize script errors for >> things loaded as CORS cross-origin scripts: https://html.spec. >> whatwg.org/#muted-errors. If the resource hasn't opted-in to being >> same-origin with you, script errors leak data you wouldn't otherwise have >> access to. >> >> >> Thanks for the summary, Mike! It's a good overview of the issue, but I'd >> like to expand on the reasoning for why including the prefix of an inline >> script doesn't sound particularly scary to me. >> >> >> Thanks for fleshing out the counterpoints, Artur! >> >> >> Basically, in order for this to be a concern, all of the following >> conditions need to be met: >> >> 1. The application has to use untrusted report collection infrastructure. >> If that is the case, the application is already leaking sensitive data from >> page/referrer URLs to its collector. >> >> >> "trusted" to receive URLs doesn't seem to directly equate to "trusted" to >> store sensitive data. If you're sure that you don't have sensitive data on >> your pages, great. But you were also presumably "sure" that you didn't have >> inline script on your pages, right? :) >> >> >> Keep in mind that URLs are sensitive data for most applications and they >> are currently being sent in violation reports. >> >> >> URLs should not contain sensitive data, precisely because they are often >> relayed to third parties in e.g. referrer. If they do, that's usually >> considered a vulnerability by e.g. OWASP, especially if the URL is >> capability bearing (IDOR >> <https://www.owasp.org/index.php/Top_10_2013-A4-Insecure_Direct_Object_References> >> is even in OWASP Top 10). I agree that some applications disclose sensitive >> information in URLs, but that's should not be the majority of them. I think >> that for a lot of applications URLs reported through e.g. CSP violation >> reports are still subject to regular access control, whereas we just don't >> know yet if sensitive tokens are not included in script samples. It's >> likely some IDs are present in the inline scripts <a href=# >> oncllick=a.select(123)> >> >> The password reset example Craig is talking about is a fairly ubiquitous >> feature which generally needs to pass sensitive data in the URL for >> compatibility with email clients. Also, capability URLs are a thing in a >> lot of apps (e.g. Google Docs), and so are various features which reveal >> the identity of the current user (/profile/koto). >> >> But even without direct leaks of PII and SPII, URLs contain a lot of data >> about the current user, the data they have in the given application, and >> their interactions with the app. For example, a search engine is likely to >> disclose your queries, a social network will leak the IDs of your friends >> and groups you belong to, a mapping site will have your location, and even >> a news site will disclose the articles you read, which are sensitive in >> certain contexts. >> >> This is difficult to definitively prove, but I'd say that claiming that >> an application doesn't have any interesting/sensitive data in its URLs >> would be an exception rather than the norm. But I'm not sure how to >> convince you other than letting you pick some interesting apps and trying >> to find interesting stuff in their URLs ;-) >> >> Cheers, >> -A >> >> >> >> >> I'm having a difficult time imagining a case where an application is okay >> with disclosing their URLs to a third-party for the purpose of debugging >> violation reports, and is not okay with disclosing script prefixes for the >> same purpose, given that: >> 1) Almost all applications have sensitive data in URLs, compared to a >> certainly real, but less specific risk of having inline scripts with >> sensitive data in its prefix, assuming it's limited to a reasonable length. >> >> >> Citation needed (for the "almost all" claim). I agree the risk of leaking >> sensitive date might be mitigated by adding a reasonable length limit. >> >> 2) URLs are disclosed much more frequently than script samples would be, >> because they are sent with every report (not just "inline" script-src >> violations). In the `referrer` field, the UA is also sending a URL of >> another, unrelated page, increasing the likelihood that sensitive data will >> appear in the report. >> >> >> Which is why it's a best practice not to have sensitive data in URLs, but >> instead e.g. using cookies or POST parameters to transfer them. >> >> >> 3) There is no sanitization of URL parameters in violation reports, >> compared to the prefixing logic we're considering for script samples. >> >> >> In fact, I'd be much more worried about URLs than script prefixes, >> because URLs leak on *any* violation (not just for script-src) and URLs >> frequently contain PII or authorization/capability-bearing tokens e.g >> for password reset functionality. >> >> >> We've talked a bit about URL leakage in https://github.com/w3c/ >> webappsec-csp/issues/111. I recall that Emily was reluctant to apply >> referrer policy to the page's URL vis a vis the reporting endpoint, but I >> still think it might make sense. >> >> >> 2. The application needs to have a script which includes sensitive user >> data somewhere in the first N characters. FWIW in our small-scale analysis >> of a few hundred thousand reports we saw ~300 inline script samples sent >> by Firefox (with N=40) and haven't found sensitive tokens in any of the >> snippets. >> >> >> Yup. I'm reluctant to draw too many conclusions from that data, given the >> pretty homogeneous character of the sites we're currently applying CSP to >> at Google, but I agree with your characterization of the data. >> >> Scott might have more data from a wider sampling of sites, written by a >> wider variety of engineering teams (though it's not clear that the terms of >> that site would allow any analysis of the data). >> >> >> I completely agree, this data is just what we had readily available -- we >> can certainly do a much larger analysis of script prefixes based on the >> search index. That said, if we're worried about the script-sample approach, >> perhaps not seeing any sensitive data in the first dataset we looked at >> could be a signal that it's worth pursuing further. >> >> 3. The offending script needs to cause a CSP violation, i.e. not have a >> valid nonce, meaning that the application is likely broken if the policy is >> in enforcing mode. >> >> >> 1. Report mode exists. >> >> 2. Embedded enforcement might make it more likely that XSS on a site >> could cause policy to be inadvertantly applied to itself or its >> dependencies. We talked about this briefly last week, and I filed >> https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :) >> >> >> Since CSPs applied by embedded enforcement serve a very different purpose >> than current policies (they don't try to mitigate script injection), it >> would very likely be okay to just not include script-sample data for such >> policies. Also, embedded enforcement is still pretty far off, and the >> reporting problem is an issue for pretty much every site currently >> gathering violation reports; we should probably weigh the value of fixing >> CSP reporting accordingly. >> >> >> As a security engineer, I would consider #1 to be the real security >> boundary -- a developer should use a CSP collector she trusts because >> otherwise, even without script-sample, reports contain data that can >> compromise the application. >> >> >> That sounds like an argument for reducing the amount of data in reports, >> not for increasing it. I think it's somewhat rational to believe that >> reporting endpoints are going to have longer retention times and laxer >> retention policies than application databases. Data leaking from the latter >> into the former seems like a real risk. I agree that the URL itself already >> presents risks, but I don't understand how that's a justification for >> accepting more risk. >> >> >> It is an argument for using trusted infrastructure when building your >> application ;-) Developers are already accustomed to deciding whether to >> place trust in various components of their apps, whether it's the hosting >> platform and OS, server-side modules and libraries, or JS widgets and other >> embedded resources. A CSP violation endpoint is currently a >> security-critical part of an application because it receives URLs; people >> who don't trust their collection infrastructure already have insecure >> applications and adding script-sample to reports does little to change >> this. (Note that this wouldn't hold for applications which have nothing >> sensitive in URLs and embed sensitive data at the beginning of inline >> scripts, but this doesn't seem like a common pattern.) >> >> Basically, the reluctance to include relevant debugging information in >> the violation report seems to be somewhat of a misplaced concern to me, >> because it ignores the trust relationship the application owner must >> already have with their report collection endpoint. >> >> Perhaps it's pertinent to take a step back and think about the reason to >> have reporting functionality in CSP in the first place -- after all, the >> mechanism could certainly work only via throwing SecurityPolicyViolation >> events and requiring developers to write their own logging code. The fact >> that this capability exists in UAs, and is not restricted to sending >> reports to the same origin or same "base domain" (contrary to the original >> proposals, e.g. in http://research.sidstamm.com/papers/csp-www2010.pdf) >> indicates that CSP wants to be flexible and give developers ultimate >> control over the reporting functionality. Given this design choice, it >> seems okay to trust the developer to pick the right report URI for their >> application and include useful debugging data if the developer wants it; in >> a way, the status quo is the worst of both worlds, because it already >> requires the developer to fully trust the collector, but doesn't give her >> enough useful data to track down causes of violations. >> >> In case it helps: Lukas ran a quick analysis of the report-uri values >> we've seen in the wild, and e.g. for the domains with CSP in Alexa 100,000 >> we see the following: >> - 49% don't set a report-uri >> - 29% have a report-uri pointing to a relative path (/foo) >> - 10% have a report-uri pointing to the same origin, with another 1% >> using a sibling subdomain (foo.example.org reporting to csp.example.org) >> >> Out of the remaining ~10% which send violations to external URLs, about >> half point to report-uri.io and a couple of other logging services, and >> the rest seems to use another domain owned by the same person/organization, >> e.g. vine.co sends reports to twitter.com. The data for all domains in >> our set isn't substantially different (66% without report-uri; 24% >> reporting to own domain; 10% externally). This data doesn't include all the >> Google ccTLDs and a couple of other big providers, and I'm sure it's >> missing some other domains, e.g. ones with CSP in parts of the site >> requiring authentication, but AFAIK it shouldn't have a systematic bias >> otherwise. >> >> I can easily imagine scripts that violate conditions #2 and #3, but at >> the same time we have not seen many examples of such scripts so far, nor >> have people complained about the script-sample data already being included >> by Firefox (AFAIK). >> >> >> People are generally unlikely to complain about getting more data, >> especially when the data's helpful and valuable. That can justify pretty >> much anything, though: lots of people think CORS is pretty restrictive, for >> instance, and probably wouldn't be sad if we relaxed it in various ways. >> >> >> Overall, I don't see the gathering of script samples as qualitatively >> different to the collection of URLs. However, if we are indeed particularly >> worried about script snippets, we could make this opt-in and enable the >> functionality only in the presence of a new keyword (report-uri /foo >> 'report-script-samples') and add warnings in the spec to explain the >> pitfalls. This way even if I'm wrong about all of the above we would not >> expose any data from existing applications. >> >> >> I suspect that such an option would simply be copy-pased into new >> policies, but yes, it seems like a reasonable approach. >> >> >> For some background about why we're even talking about this: currently >> violation reports are all but useless for both debugging and detection of >> the exploitation of XSS due to the noise generated by browser extensions. >> >> >> I agree that this is a problem that we should solve. One way of solving >> it is to add data to the reports. Another is to invest more in cleaning up >> the reports that you get so that there's less noise. I wish browser vendors >> (including Chrome) spent more time on the latter, as we're actively harming >> users by not doing so. >> >> >> Yes, fixing the blocking and reporting of extension-injected scripts >> would certainly help (although note that "less noise" likely isn't >> sufficient, it really has to be zero noise), but IIRC prior discussions >> we've had about the topic indicated that this is an almost intractable >> problem, so it would be great to find alternative solutions. >> >> The script sample approach also has several important advantages because >> even without extension-related false positives, developers would have very >> little information about the actual cause of inline script violations >> (which are the majority of possible CSP problems in nonce-based policies). >> Sending some of the script text not only makes it possible to discard all >> spurious reports, but also gives the developer the crucial bit of data to >> find and fix actual site errors; it seems like a workable solution to the >> current reporting problems faced by many sites. >> >> Cheers, >> -Artur >> >> >> >> >> -- >> Best regards, >> Krzysztof Kotowicz >> >> >>
Received on Wednesday, 22 February 2017 14:24:31 UTC