- From: Kornel Lesiński <kornel@geekhood.net>
- Date: Tue, 12 May 2009 01:02:05 +0100
On 06.05.2009, at 17:31, Adam Barth wrote: > > WHY NOT toStaticHTML? > > toStaticHTML addresses the same use cause by translating an untrusted > string to another string that lacks active HTML content. This API has > two issues: > > 1) The untrusted string -> static string -> HTML parser workflow > requires the browser to parse the string twice, introducing a > performance penalty and a security issue if the two parsing aren't > identical. That is based on assumptions that: 1. parsing is expensive enough to warrant API optimized for this particular case 2. browsers cannot optimize it otherwise 3. returned code will be ambiguous In client-side scripts untrusted content comes from the network, which means that parsing time is going to be miniscule compared to time required to fetch the content (and to render it). My guess is that parsing itself is not a bottleneck. Second, it _is_ possible to avoid reparsing without special API for this. toStaticHTML() may return subclass of String that contains reference to parsed DOM. Roughly something like this: function toStaticHTML(html) { var cleanDOM = clean(parse(html)) return { toString:function(){return unparse(cleanDOM)}, node:cleanDOM } } which should make common case: innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html; toStaticHTML() enables other optimisations, e.g. filtered HTML can be saved for future use (in local storage) or string filtered once used in multiple places. Alternatively there could be toStaticDOM() method that returns DOMDocumentFragment, avoiding reparsing issue entirely. > 2) The API is difficult to future-proof because future versions of > HTML are likely to add new tags with active content (e.g., like the > <video> tag's event handlers). When support for new tag is added to a browser, it would also be added to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't be a problem either way. Browser doesn't need to worry about dangerous constructs it does not support. Methods are easier to patch than properties in JavaScript, so if implementation of existing toStaticHTML() turned out to be insecure, the method could be easily replaced/patched on cilent-side, or applications could post-process output of toStaticHTML(). It's not that easy with a property. I dislike APIs based on magic properties. Properties cannot take arguments and we'd have to create new property for every combination of arguments. If innerHTML was a method, instead of creating new property we could extend it to be innerHTML(html, static=true). If more sophisticated filtering becomes needed in the future, we could have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}), but it would be silly to create another innerStaticHTMLwithSVGandRDFbutWithoutMarquee property. -- regards, Kornel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090512/15357f76/attachment.htm>
Received on Monday, 11 May 2009 17:02:05 UTC