- From: Alex Russell <slightlyoff@google.com>
- Date: Mon, 25 Jan 2010 15:45:56 -0800
On Sun, Jan 24, 2010 at 2:52 AM, Ian Hickson <ian at hixie.ch> wrote: > On Fri, 11 Dec 2009, Michal Zalewski wrote: >> >> 1) IFRAME semantics make it exceedingly cumbersome to sandbox short >> snippets of text, and this task is perhaps the most common and pressing >> XSS-related challenge. Unless the document is constructed on client side >> by JavaScript, sites would need to use opaque data: URLs, or put up with >> a lot of additional HTTP roundtrips, to utilize sandboxed IFRAMEs for >> this purpose. [ There is also the problem of formatting and positioning >> IFRAME content, although the seamless attribute would fix this. ] > > I've introduced srcdoc="" to largely handle this. There is an example in > the spec showing how it can be used. > > >> The ability to sandbox SPANs or DIVs using a token-guarded approach >> (<span sandbox="random_token"></span sandbox="same_token">) is, on the >> other hand, considerably easier on the developer, and probably has a >> very similar implementation complexity. > > This has been proposed before. The concern is that many authors would be > likely to make mistakes in their selection of "random" tokens that would > lead to significant flaws in the deployment of the feature. > > srcdoc="" is less prone to errors. Only " and & characters need to be > escaped. If the " character is not escaped, then a single " character in > the input will cause the comment to break. This is likely to be caught > early. If the & character is not escaped, correctness and fidelity will > suffer, but it will not lead to security errors. Sorry I'm late to this discussion. Would like to add my objection to using attribute string escaping as a security "feature" in any way. I strongly prefer required nonces attached to opening and closing of sections. >> 2) Renderers suck dealing with IFRAMEs, and will probably continue to >> do so for time being. This means that a typical, moderately complex >> application (say, as a discussion forum or a social site), where >> hundreds of user-controlled strings may need to be present to display >> user content - the mechanism would have an unacceptable load time and >> memory footprint. In fact, people are already coming up with >> lightweight alternatives with a significant functionality overlap (and >> different security controls). Microsoft has toStaticHTML(), while a >> standardized implementation is being discussed here right now in a >> separate thread. > > I agree that we should investigate other options too (<iframe> boxes > aren't suitable for everything), but I don't think that current > implementation problems with <iframe> should necessarily prevent us from > investigating sandboxed iframes too. > > In certain contexts, e.g. reddit comments, it may be the case that instead > of one sandboxed <iframe> per comment, the best way to do things is > instead one sandboxed iframe for all the comments, with scripts disabled > and allow-same-origin enabled, so that scripts can poke into the page and > set event handlers on all the relevant links. > > >> Isn't the benefit of keeping the design slightly simpler (and >> realistically, limited to relatively few usage scenarios) negated by the >> fact that alternative solutions to other narrow problems would need to >> emerge elsewhere? The browser coming with several different script >> sanitizers with completely different APIs and security controls does not >> strike me as a desirable outcome (all the flavors of SOP are a testament >> to this). If the anser is not a strong "no", maybe the token-guarded DIV >> / SPAN approach is a better alternative? > > I agree in principle that fewer features are better than more features, > but we have to take into account that many of the people deploying these > features know nothing about security. We have to ensure that the security > aspects of features like this (like what to escape, what security tokens > need to be generated) are aligned with the practical aspects of features > like this (like what results in the page appearing to work, regardless of > the state of security). > > >> Now, that aside - on a more pragmatic level, I have two extra comments: >> >> 1) The utility of the SOP sandboxing behavior outlined in the spec is >> diminished if we have no way to actually *enforce* that the IFRAMEd >> resource would only be rendered in such a context. If I am serving >> user-supplied, unsanitized HTML, it is obviously safe to do <iframe >> sandbox src="show.cgi?id=1234"></iframe> - but where do we prevent the >> attacker from calling http://my_site/show.cgi?id=1234 directly, and >> bypassing the filter? > > I've introduced text/html-sandboxed for this purpose. > > >> 2.1) The ability to disable loading of external resources (images, >> scripts, etc) in the sandboxed document. The common usage scenario is >> when you do not want the displayed document to "phone home" for privacy >> reasons, for example in a web mail system. > > Good point. Should we make sandbox="" disable off-origin network requests? > > >> 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually >> be approximated with the excommunicated <plaintext> tag, or with >> Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or >> DIVs, however, it would be pretty damn useful for displaying text >> content without the need to escape &, <, >, etc. "Pure" security benefit >> is limited, but as a phishing prevention and display correctness >> measure, it makes sense. > > I don't really understand the use case here; could you elaborate? > > > On Sun, 13 Dec 2009, Michal Zalewski replied to Tab: >> > >> > I believe that the @doc attribute, discussed in the original threads >> > about @sandbox, will be introduced to deal with that. ?It'll take >> > plain html as a string, avoiding the opaqueness and larger escaping >> > requirements of a data:// url, as the only thing you'll have to escape >> > is whichever quote you're using to surround the value. >> >> That doesn't strike me as a robust way to prevent XSS - the primary >> reason why we need sandboxing to begin with is that people have a >> difficulty properly parsing, serializing, or escaping HTML; so replacing >> this with a mechanism that still requires escaping is perhaps >> suboptimal. > > There's a world of difference between "properly parsing, serializing, or > escaping HTML" and "escaping quotes and ampersands". > > >> > ?More importantly, though, it puts a significant burden on authors to >> > generate unpredictable tokens. ?Is this difficult? ?No, of course not. >> > But people *will* do it badly, copypasting a single token in all their >> > <iframe>s or similar. >> >> People already need to do this well for XSRF defenses to work, and I'd >> wager it's a much simpler and better-defined problem than real-world >> HTML parsing and escaping could realistically be. It is also very easy >> to delegate this task to existing functions in common web frameworks. > > Do people get CSRF right more often than simply escaping characters? It > seems implausible that authors get complex cryptographic properties right > more often than a simple set of substitutions, but I suppose stranger > things are true on the Web. > > >> Also, a single token on a returned page, as long as it's unpredictable >> across user sessions, should not be a significant issue. > > I'm just worried that some people would just a constant string. > > > On Sun, 13 Dec 2009, Adam Barth wrote: >> >> I agree that we need something to help with content received by >> cross-site XMLHttpRequest and postMessage. ?For those use cases, we're >> already running script, so a design like toStaticHTML seems better than >> <jail>. > > If the data is to be rendered into a block-level box, it seems that > srcdoc="" might actually handle that case too. > > > On Sun, 13 Dec 2009, Michal Zalewski replied to Adam: >> > >> > The @sandbox seems like a better fit for the advertising use case. >> >> I am not contesting this, to be clear - I am aware of many cases where >> it would be very useful - but gadgets are a fairly small part of the >> Internet, and seems like a unified solution would be more desirable than >> several very different APIs with different granularity. >> >> The toStaticHTML-alike will address another specific uses, but will >> leave applications that can't rely on JS exclusively for their rendering >> needs (which I'd wager is still a majority) out in the cold; which would >> probably lead to a yet another XSS prevention / HTML sandboxing approach >> emerging later on. >> >> I haven't really seen a compelling argument why all these can't be >> unified without a significant increase in code or spec complexity - >> maybe one exists. > > What would they be unified under? I don't think anyone has proposed > anything that solves all the problems that CSP, sandbox="", srcdoc="", > toStaticHTML(), httpOnly, text/html-sandboxed, and the various other > "security" mechanisms introduced to the platform over the past few years > would solve without introducing more complexity overall. > > There are many problems to solve. It seems logical that we'd end up with > many solutions. > > > On Sun, 13 Dec 2009, Michal Zalewski replied to Adam: >> > >> > That seems like a backwards way of proceeding. ?Do you have a proposal >> > for unification besides the <jail> tag? >> >> The only fundamental objection I have heard against it is the trouble >> with XML representation. > > Well, it also doesn't really solve all the problems. For example, it > doesn't solve the "embedding external content safely" problem. > > >> The other option is to simply require a traditional CDATA-esque behavior >> or a tag parameter - which would place the burden on the author to >> filter out / escape a single exact string or a quote, but would be >> similar otherwise. > > That's similar to what srcdoc="" does when used with sandbox="". > > >> It's obviously less secure - because while the token-based approach >> actually requires the user to explicitly come up with a token, however >> poor it might be; whereas here, there is no way to enforce escaping. > > The token-based approach could lead an author to just coming up with a > constant token, which is just as useless as not enforcing escaping, except > that the author had to wonder how to get security to use it, and thus the > author will have a false sense of security whose only likely failure mode > is an actual attack. Compare this to srcdoc="", where the failure mode is > the use of a quote mark, and is thus likely to happen much earlier than an > attack. It's also easier to understand the failure mode. "The token has to > be unguessable" is harder to explain than "quotes have to be escaped". > > >> From Tab's response, looks like it's being considered, too - @doc + >> @seamless. What's strikes me as a bit ironic is that this way, we're >> overloading IFRAME to become something else entirely, and after >> rejecting token-guards, settling for an option that is definitely not >> perfect, and in practice, I think, is bound to be less secure. > > I don't really follow the "something else entirely" bit. Also, why would > it be less secure? What is the attack scenario? > > > On Sun, 13 Dec 2009, Michal Zalewski wrote: >> >> Huh? But that's not the point I am making... I am not arguing that >> iframe sandbox should be abandoned as a bad idea - quite the opposite. >> >> I was merely suggesting that we *expand* the same logic, and the same >> excellent security control granularity, to span and div; this seems like >> it would not increase the implementation complexity in any significant >> way. > > I don't understand the proposal then. What is the problem it is solving, > and how does it solve it? > > >> We could then allow these to be populated with secure contents in three >> ways: >> >> 1) Guarded closing tag - this is simple and bullet-proof; but may >> conflict with XML serializations, and hence require some hacks, > > I strongly disagree with the characterisation of this idea as "simple and > bullet-proof", at least for anyone who doesn't understand cryptography. > > >> 2) CDATA or @doc-like approaches. Less secure because it does not >> enforce a security control, but less contentious, and already being >> considered for IFRAMEs. > > I don't understand what you mean by "does not enforce a security control", > or how a guarded closing tag does "enforce a security control". > > >> 3) .innerHTML, which would be then safe by default, without the need for >> .innerSafeHTML (and the associated ambiguities) or explicit >> .toStaticHTML calls. > > To run scripts in a safe environment, we need to have a separate global > object, which is why we're using <iframe> for it. This supports the > equivalent of ".innerHTML" as you describe (.srcdoc). > > If you just want something that blocks scripts, plugins, forms, targeted > links, etc, without a separate document, then it's not clear to me that > that is something that is sanely achievable. It would require complex > changes all over the place. > > What is the use case this is targetted at? > > > On Sun, 13 Dec 2009, Adam Barth wrote: >> >> I'm very interested in a solution that works for the following use >> cases: >> >> 1) A web page wants to display untrusted (i.e., restricted) HTML >> received via cross-site XMLHttpRequest or postMessage. > > Do you have a concrete use case for which <iframe> doesn't work? > > >> 2) A blog wishes to display many comments containing untrusted (i.e., >> restricted) HTML. > > It seems <iframe srcdoc> works well for this case. You can even safely > enable scripts in the comments, so that people can upload little > calculator-like things or games, not that I would recommend that! > > > On Sun, 13 Dec 2009, Michal Zalewski wrote: >> >> [...] this really strikes me as throwing random ideas at the wall, and >> seeing which ones stick. > > Welcome to Web standards development. :-) > > >> Furthermore, in this particular case, I am really concerned that the >> spec is at odds with itself - you mention certain specific use cases, >> but the spec seems to be after a broader goal: sandboxing user-supplied >> content in general. In doing so, it gives some bad advice (again, the >> user content example is exploitable, at least until the arrival of some >> out-of-scope security mechanism to prevent it). > > I've added a warning to the spec pointing out that the text/html-sandboxed > MIME type has to be used in that case. > > > On Sun, 13 Dec 2009, Aryeh Gregor wrote: >> >> So instead, why not just use the standard escaping mechanisms we already >> have? ?Allow a sandbox attribute on all elements that can contain >> phrasing or flow content. ?Any such element with a sandbox attribute >> will be required to contain no literal <>'" before the closing tag. ?If >> any of those four characters is encountered, the element is treated as >> having no contents. ?Otherwise, the browser unescapes all characters >> with special meanings ("<" -> "<", ">" -> ">", "&" -> "&", >> etc.) and then treats the resulting string as the inner HTML of the >> element, parsing it like regular HTML, but the contents are sandboxed. >> >> Examples: >> >> <span sandbox>This span will work normally, except for being >> sandboxed.</span> >> >> <span sandbox>This span will be <em>empty</em> in the DOM, even though >> it contains no evil content, because otherwise authors will forget to >> escape the contents of the sandbox.</span> >> >> <span sandbox><span>But this span will have another span as its >> child, sandboxed. ?The regular parser sees no entities here, only a >> nested span!</span></span> >> >> <span sandbox>It would be safe to allow this to work, since it only >> contains an apostrophe, but let's not, so that lack of escaping is >> easier to catch. ?This span is therefore also empty.</span> > > What would the "sandbox" do, other than require one level of escaping? > i.e. what is it protecting against? > > -- > Ian Hickson ? ? ? ? ? ? ? U+1047E ? ? ? ? ? ? ? ?)\._.,--....,'``. ? ?fL > http://ln.hixie.ch/ ? ? ? U+263A ? ? ? ? ? ? ? ?/, ? _.. \ ? _\ ?;`._ ,. > Things that are impossible just take longer. ? `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 25 January 2010 15:45:56 UTC