- From: Ian Hickson <ian@hixie.ch>
- Date: Sat, 6 Jun 2009 01:09:37 +0000 (UTC)
I include below a summary of the feedback received on the topic of a security-aware innerHTML. Defining a spec-blessed whitelist of element, attributes, and attribute values is and filtering at the parser level is a significant new feature. While I see that it has value, I think on the short term it would be better to wait for a future version of HTML before introducing this feature; ideally once we have more implementation experience with experimental versions of this idea. I would encourage browser vendors to introduce APIs similar to that discussed below, clearly marked as vendor-specific (e.g. for Firefox, something like .mozStaticInnerHTML). On Wed, 6 May 2009, Adam Barth wrote: > > Rob and I were discussing the use case for IE8's toStaticHTML API and > thought it might make sense to standardize a more robust and > future-proof API for the same use case. > > USE CASE > > I receive an untrusted string, for example a weather report or a Twitter > status update, from postMessage or a cross-origin XMLHttpRequest, and I > want to display its content to the user without getting XSSed. > > WORKAROUNDS > > If the content is purely text (e.g., no images, styles, or hyperlinks), > then I can create a text node containing the string and insert it into > my page's DOM. If the content is not purely text, I need to implement > an XSS filter in JavaScript (which folks commonly screw up). > > PROPOSAL > > In addition to innerHTML, DOM elements should expose an innerStaticHTML > property. When set, innerStaticHTML should behave the same as innerHTML > except that scripts should not execute (even in event handlers) and > plug-ins should not be created. > > EXAMPLE > > <script> > fetchMostRecentTweetFor("whatwg", function (tweet) { > document.getElementById("whatwg_tweet).innerStaticHTML = tweet; > }); > </script> > > WHY NOT toStaticHTML? > > toStaticHTML addresses the same use cause by translating an untrusted > string to another string that lacks active HTML content. This API has > two issues: > > 1) The untrusted string -> static string -> HTML parser workflow > requires the browser to parse the string twice, introducing a > performance penalty and a security issue if the two parsing aren't > identical. > > 2) The API is difficult to future-proof because future versions of HTML > are likely to add new tags with active content (e.g., like the <video> > tag's event handlers). Either we'll have to commit to a toStaticHTML > algorithm that will be secure in all future versions of HTML, or we'll > have to change the input-output behavior of toStaticHTML in future > versions of HTML. > > innerStaticHTML addresses the same use case without these issues. On Wed, 6 May 2009, Jo??o Eiras wrote: > > As part of a browser implementation team I can clearly say that the > cases where scripts should, or should not run are very hard to implement > in a cross browser compatible way. Marking those scripts or plugins are > non-executable would make everything much more complex and bug prone. > Also, it would be impossible to do that for a onevent attribute without > all sorts of problems. The suggestion of marking content as > non-executable doesn't solve anything, because after setting > innerStaticHTML another script might serialize a piece of the affected > DOM to string and back to a tree, and the code could then execute, which > would not be wanted. > > The only viable solution, from my point of view, would be for the UA to > parse the string, and remove all untrusted content from the result tree > before appending to the document. That would mean removing all onevent > attributes, all scripts elements, all plugins, etc. Basically, letting > the UA implement all the filtering. On Wed, 6 May 2009, Robert O'Callahan wrote: > > I think that's actually what Adam is proposing. At least, it's what I > had in mind when we discussed it. On Thu, 7 May 2009, Tab Atkins Jr. wrote: > > I'm in favor of this. Browser-specified sanitizing, woo! > > Obviously this doesn't replace the need for sandbox iframes (those are > still necessary for building a page using external html without > javascript), but it's a much easier solution for pretty much any > js-based sandbox-iframe situation. On Wed, 6 May 2009, Philip Taylor wrote: > > Could <iframe sandbox> work as a workaround? > > var iframe = document.createElement('iframe'); > iframe.sandbox = ''; // (um, I hope this is right? I'm guessing > // any non-null/undefined value enables > // sandboxing, or something) > iframe.seamless = true; > iframe.src = 'data:text/html,'+encodeURIComponent(tweet); > document.getElementById('whatwg_tweet').appendChild(iframe); On Wed, 6 May 2009, Robert O'Callahan wrote: > > Seamless sandboxed IFRAMEs are probably harder to implement, probably > heavier-weight, and won't work in all situations, such as if you want to > get safe inline content or want to safely manipulate the content before > displaying it. On Thu, 7 May 2009, Kristof Zelechovski wrote: > > If toStaticHTML prunes everything it is not sure of, the danger of a > known language construct suddenly introducing active content is > negligible. I am sure HTML5 specification editors bear that aspect in > mind and so shall they in the future. On Tue, 12 May 2009, Kornel Lesi?~Dski wrote: > > That is based on assumptions that: > 1. parsing is expensive enough to warrant API optimized for this > particular case > 2. browsers cannot optimize it otherwise > 3. returned code will be ambiguous > > In client-side scripts untrusted content comes from the network, which > means that parsing time is going to be miniscule compared to time > required to fetch the content (and to render it). My guess is that > parsing itself is not a bottleneck. > > Second, it _is_ possible to avoid reparsing without special API for > this. toStaticHTML() may return subclass of String that contains > reference to parsed DOM. Roughly something like this: > > function toStaticHTML(html) { > var cleanDOM = clean(parse(html)) > return { > toString:function(){return unparse(cleanDOM)}, > node:cleanDOM > } > } > > which should make common case: > > innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html; > > toStaticHTML() enables other optimisations, e.g. filtered HTML can be > saved for future use (in local storage) or string filtered once used in > multiple places. > > Alternatively there could be toStaticDOM() method that returns > DOMDocumentFragment, avoiding reparsing issue entirely. > > > 2) The API is difficult to future-proof because future versions of > > HTML are likely to add new tags with active content (e.g., like the > > <video> tag's event handlers). > > When support for new tag is added to a browser, it would also be added > to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't be > a problem either way. Browser doesn't need to worry about dangerous > constructs it does not support. > > Methods are easier to patch than properties in JavaScript, so if > implementation of existing toStaticHTML() turned out to be insecure, the > method could be easily replaced/patched on cilent-side, or applications > could post-process output of toStaticHTML(). It's not that easy with a > property. > > I dislike APIs based on magic properties. Properties cannot take > arguments and we'd have to create new property for every combination of > arguments. If innerHTML was a method, instead of creating new property > we could extend it to be innerHTML(html, static=true). > > If more sophisticated filtering becomes needed in the future, we could > have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}), but > it would be silly to create another > innerStaticHTMLwithSVGandRDFbutWithoutMarquee property. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 5 June 2009 18:09:37 UTC