- From: Adam Barth <whatwg@adambarth.com>
- Date: Tue, 1 Dec 2009 00:14:09 -0800
Your main point is well taken. There are some technical reasons why tag whitelisting makes more sense for inline content. For example, consider the case you mentioned on webkit-dev: @id. Inline, @id is problematic because the ids exist in a per-frame namespace, whereas they're harmless when the untrusted content has an entire iframe to itself. Of course, @id is not unique in this respect. For example, <input type=password> will likely get autofilled by the password manager inline and @style can be used to draw all over the page without an iframe's layout contraints. That said, I'm not married to a design with a tag-level whitelist. Do you have a specific alternative in mind? Adam On Mon, Nov 30, 2009 at 7:43 PM, Maciej Stachowiak <mjs at apple.com> wrote: > > On Nov 30, 2009, at 6:32 PM, Adam Barth wrote: > >> On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak <mjs at apple.com> wrote: >>> >>> 1) It seems like this API is harder to use than a sandboxed iframe. To >>> use >>> it correctly, you need to determine a whitelist of safe elements and >>> attributes; providing an explicit whitelist at least of tags is >>> mandatory. >>> With a sandboxed iframe, as a Web developer you can just ask the browser >>> to >>> turn off unsafe things and not worry about designing a security policy. >>> Besides ease of use, there is also the concern that a server-side >>> filtering >>> whitelist may be buggy, and if you apply the same whitelist on the client >>> side as backup instead of doing something high level like "disable >>> scripting" then you are less likely to benefit from defense in depth, >>> since >>> you may just replicate the bug. >> >> I should follow up with folks in the ruby-on-rails community to see >> how they view their sanitize API. ?The one person I asked had a >> positive opinion, but we should get a bigger sample size. > > For server-side sanitization, this kind of explicit API is pretty much the > only thing you can do. > >> >> I think updateWithSanitizedHTML has different use cases than @sandbox. >> I think the killer applications for @sandbox are advertisements and >> gadgets. ?In those cases, the developer wants most of the browser's >> functionality, but wants to turn off some dangerous stuff (like >> plug-ins). ?For updateWithSanitizedHTML, the killer application is >> something like blog comments, where you basically want text with some >> formatting tags (bold, italics, and maybe images depending on the >> forum). > > I can imagine use cases where allowing very open-ended but script-free > content is desirable. For example, consider a hosted blog service that wants > to let blog authors write nearly arbitrary HTML, but without allowing > script. @sandbox would not be a good solution for that use case. In general > it does not seem sensible to me that the choice of tag whitelisting vs > high-level feature whitelisting is tied to the choice of embedding content > directly vs. creating a frame. Is there a technical reason these two choices > have to be tied? > >> >>> 2) It seems like this API loses one of the big benefits of sanitizing >>> HTML >>> in the browser implementation. Specifically, in theory it's safe to say >>> "allow everything except any construct that would result in script/code >>> running". You can't do that on the server side - blacklisting is not >>> sound >>> because you can't predict the capabilities of all browsers. But the >>> browser >>> can predict its own capabilities. Sandboxed iframes do allow for this. >> >> The benefit is that you know you're getting the right parsing. ?You're >> not going to be tripped up by <img/src=javascript: and friends. > > It's true, this is a benefit. However, it seems like even if you whitelist > tags, being able to say "no script" at a high level > >> Also, this API is useful in cases where you don't have a server to help >> you >> sanitize your input. ?One example I saw recently was a GreaseMonkey >> script that wanted to add EXIF metadata to Flickr. ?Basically, the >> script grabbed the EXIF data from api.flickr.com and added it to the >> current page. ?Unfortunately, that meant I could use this GreaseMonkey >> script to XSS Flickr by adding HTML to my EXIF metadata. ?Sure, there >> are other ways of solving the problem (I asked the developer to build >> the DOM in memory and use innerText), but you want something simple >> for these cases. > > If the EXIF metadata is supposed to be text-only, it seems like > updateWithSanitizedHTML would not be easier to use than innerText, or in any > way superior. For cases where it is actually desirable to allow some markup, > it's not clear to me that giving explicit whitelists of what is allowed is > the simple choice. > >> >>> I think the benefits of filtering by tag/attribute/scheme for advanced >>> experts are outweighed by these two disadvantages for basic use, compared >>> to >>> something simple like the original staticInnerHTML idea. Another possible >>> alternative is to express how to sanitize at a higher level, using >>> something >>> similar to sandboxed iframe feature strings. >> >> If you think of @sandbox as being optimized for rich untrusted content >> and updateWithSanitizedHTML as being optimized for poor untrusted >> content, then you'll see that's what the API does already. ?The >> feature string Slashdot wants for its comments is ("a b strong i em", >> "href"), but another message board might want something different. >> For example, 4chan might want ("img", "src alt"). ?I don't think these >> require particularly advanced experts to understand. > > updateWithSanitizedHTML and @sandbox both provide features that the other > does not for reasons that do not seem technically necessary. For example, > updateWithSanitizedHTML could easily have an "allow everything except > script" mode, and @sandbox could easily allow per-tag whitelisting. Then the > choice would be between the resource cost of a frame, and the sandboxing > features that it's impractical to provide without a frame (limiting content > to a bounding box while still allowing styling, allowing script without > affecting the containing content, etc). > >> >>> Here's a problem that exists with both this API and also innerStaticHTML: >>> >>> 3) There is no secure and efficient way to append sanitized contents to >>> an >>> element that already has children. This may result in authors appending >>> with >>> innerHTML += ?(inefficient and insecure!) or insertAdjecentHTML() >>> (efficient >>> but still insecure!). I'm willing to concede that use cases other than >>> "replace existing contents" and "append to existing contents" are fairly >>> exotic. >> >> Maybe we need insertAdjecentSanitizedHTML instead or in addition. ?;) > > Perhaps. The verb "update" is generic enough that it could handle different > kinds of mutations with flags, but perhaps that means it is too vague for a > security-sensitive API. > > Regards, > Maciej > >
Received on Tuesday, 1 December 2009 00:14:09 UTC