[whatwg] some thoughts on sandboxed IFRAMEs from Michal Zalewski on 2009-12-12 (public-whatwg-archive@w3.org from December 2009)

From: Michal Zalewski <lcamtuf@coredump.cx>
Date: Fri, 11 Dec 2009 20:18:46 -0800
Message-ID: <448e9a320912112018t6ca81e36g43b533a16f826a95@mail.gmail.com>
Hi folks,

So, we were having some internal discussions about the IFRAME sandbox
attribute; Adam Barth suggested it would be more productive to bring
some of the points I was making on the mailing list instead.

I think the attribute is an excellent idea, and close to the dream
design we talked about internally for a while. I do have some
peripheral concerns, though, and seems like now is the time to bring
them up!

Starting with two high-level comments: although I understand the
simplicity, and hence the appeal, of sandboxed IFRAMEs, I do fear that
they will be very hard on web developers - and hence of limited
utility. In particular:

1) IFRAME semantics make it exceedingly cumbersome to sandbox short
snippets of text, and this task is perhaps the most common and
pressing XSS-related challenge. Unless the document is constructed on
client side by JavaScript, sites would need to use opaque data: URLs,
or put up with a lot of additional HTTP roundtrips, to utilize
sandboxed IFRAMEs for this purpose. [ There is also the problem of
formatting and positioning IFRAME content, although the seamless
attribute would fix this. ]

The ability to sandbox SPANs or DIVs using a token-guarded approach
(<span sandbox="random_token"></span sandbox="same_token">) is, on the
other hand, considerably easier on the developer, and probably has a
very similar implementation complexity.

2) Renderers suck dealing with IFRAMEs, and will probably continue to
do so for time being. This means that a typical, moderately complex
application (say, as a discussion forum or a social site), where
hundreds of user-controlled strings may need to be present to display
user content - the mechanism would have an unacceptable load time and
memory footprint. In fact, people are already coming up with
lightweight alternatives with a significant functionality overlap (and
different security controls). Microsoft has toStaticHTML(), while a
standardized implementation is being discussed here right now in a
separate thread.

Isn't the benefit of keeping the design slightly simpler (and
realistically, limited to relatively few usage scenarios) negated by
the fact that alternative solutions to other narrow problems would
need to emerge elsewhere? The browser coming with several different
script sanitizers with completely different APIs and security controls
does not strike me as a desirable outcome (all the flavors of SOP are
a testament to this). If the anser is not a strong "no", maybe the
token-guarded DIV / SPAN approach is a better alternative?

Now, that aside - on a more pragmatic level, I have two extra comments:

1) The utility of the SOP sandboxing behavior outlined in the spec is
diminished if we have no way to actually *enforce* that the IFRAMEd
resource would only be rendered in such a context. If I am serving
user-supplied, unsanitized HTML, it is obviously safe to do <iframe
sandbox src="show.cgi?id=1234"></iframe> - but where do we prevent the
attacker from calling http://my_site/show.cgi?id=1234 directly, and
bypassing the filter? There are two cases where the mechanism still
offers some protection:

1.1) If I make IFRAMEd URLs unpredictable with the use of security
tokens - but if people were likely to get this right, we wouldn't have
XSRF and related issues on the web,

1.2) f I point the IFRAME to a non-same-origin domain - but if I can
do this, and work out the non-trivial authentication challenges in
such a case, I largely don't need a SOP sandbox to begin with: I can
just use <unique_id>.sandboxdomain.com. In fact, many sites I know of
do this right now.

It strikes me that this mechanism would make a whole lot more sense if
supported on HTTP header level, instead: "X-SOP-Sandbox: 1"; in its
current shape, it is defensible perhaps if aided by Mozilla's CSP.
Otherwise, it's an error-prone detail, and we should at the very least
outline why it's very difficult to get it right in the spec.

2) The utility of the "no form submission" mode is limited to certain
very specific anti-phishing uses. While this does not invalidate it,
it makes it tempting to mention two other modes we discussed
internally, and that probably fall into the same bucket:

2.1) The ability to disable loading of external resources (images,
scripts, etc) in the sandboxed document. The common usage scenario is
when you do not want the displayed document to "phone home" for
privacy reasons, for example in a web mail system.

2.2) The ability to disable HTML parsing. On IFRAMEs, this can
actually be approximated with the excommunicated <plaintext> tag, or
with Content-Type: text/plain / data:text/plain,. On token-guarded
SPANs or DIVs, however, it would be pretty damn useful for displaying
text content without the need to escape &, <, >, etc. "Pure" security
benefit is limited, but as a phishing prevention and display
correctness measure, it makes sense.

Well, that's it. Hope this does not come off as a complete rant :P

Cheers,
/mz
Received on Friday, 11 December 2009 20:18:46 UTC