[whatwg] XSS safe templating

On Mar 10, 2010, at 10:15 AM, Mike Samuel wrote:

> Hmm.  It occurs to me that many libraries -- at least jQuery and
> prototype have their own layers in between their users and the DOM.
> When I cooked up this scheme, I didn't know how likely proxies and
> ephemeron tables were to make it into ES Harmony, but I think Andreas
> Gal just implemented (both? or just ephemerons) in a tracemonkey
> nightly.  Those are all you need to do a really efficient
> interposition layer, so libraries are probably not going to stop doing
> that.
> So I think the non-document.write portion can be implemented entirely
> in the library interposition layer.
>
> document.write can be wrapped.  But the wrapper would still need to
> know the insertion-mode.
> If the insertion mode were exposed, or at least some context were
> given -- enough to know whether the next char if not something that
> changed the current token, would be interpreted as
>  * inside a tag
>  * outside a tag in a comment / application instruction / doctype
>  * outside a tag in a PCDATA/RCDATA/CDATA context and ideally in what
> kind of containing tag
>  * outside a tag in a CDATA section.
>
> So maybe some kind of
>  (DOMString|null) document.getInsertionMode().


Thoughts:

1) I'm not enthusiastic about exposing internal details of the HTML  
parser to script.

2) Given the way document.write works, the information you ask for may  
not even be available at the time of the document.write call. There  
may be considerable other parsing and/or script execution to do before  
the parser reaches the insertion point. Consider the following example:

<div>
<script>
document.write("<script src='external.js'></scr" + "ipt>");
document.write(untrustedString);
</script>
</div>

At the time of the second document.write, the *current* insertion mode  
is outside a tag in normal text content, but it's impossible to tell  
what it will be by the time untrustedString actually gets parsed. (In  
case it's not clear why: untrustedString is inserted into the  
character stream after the <script> tag loading external.js, but  
external.js is not executed until after the inline script completes.  
So it's not just inconvenient but impossible even in principle to  
determine what the parsing mode will be.) This is one of the many  
reasons document.write is a terrible API.

3) document.write and innerHTML are pretty hacky interfaces. Rather  
than trying to shore them up, we should instead recommend JavaScript  
libraries that work at a higher level and end up using DOM APIs.  
That's likely to be a lot sounder.

Regards,
Maciej

Received on Thursday, 11 March 2010 00:50:11 UTC