[whatwg] XSS safe templating

2010/3/11 Maciej Stachowiak <mjs at apple.com>:
>
> On Mar 10, 2010, at 10:15 AM, Mike Samuel wrote:
>
>> Hmm. ?It occurs to me that many libraries -- at least jQuery and
>> prototype have their own layers in between their users and the DOM.
>> When I cooked up this scheme, I didn't know how likely proxies and
>> ephemeron tables were to make it into ES Harmony, but I think Andreas
>> Gal just implemented (both? or just ephemerons) in a tracemonkey
>> nightly. ?Those are all you need to do a really efficient
>> interposition layer, so libraries are probably not going to stop doing
>> that.
>> So I think the non-document.write portion can be implemented entirely
>> in the library interposition layer.
>>
>> document.write can be wrapped. ?But the wrapper would still need to
>> know the insertion-mode.
>> If the insertion mode were exposed, or at least some context were
>> given -- enough to know whether the next char if not something that
>> changed the current token, would be interpreted as
>> ?* inside a tag
>> ?* outside a tag in a comment / application instruction / doctype
>> ?* outside a tag in a PCDATA/RCDATA/CDATA context and ideally in what
>> kind of containing tag
>> ?* outside a tag in a CDATA section.
>>
>> So maybe some kind of
>> ?(DOMString|null) document.getInsertionMode().
>
>
> Thoughts:
>
> 1) I'm not enthusiastic about exposing internal details of the HTML parser
> to script.
>
> 2) Given the way document.write works, the information you ask for may not
> even be available at the time of the document.write call. There may be
> considerable other parsing and/or script execution to do before the parser
> reaches the insertion point. Consider the following example:
>
> <div>
> <script>
> document.write("<script src='external.js'></scr" + "ipt>");
> document.write(untrustedString);
> </script>
> </div>

Ah, thanks for the excellent example.

> At the time of the second document.write, the *current* insertion mode is
> outside a tag in normal text content, but it's impossible to tell what it
> will be by the time untrustedString actually gets parsed. (In case it's not
> clear why: untrustedString is inserted into the character stream after the
> <script> tag loading external.js, but external.js is not executed until
> after the inline script completes. So it's not just inconvenient but
> impossible even in principle to determine what the parsing mode will be.)
> This is one of the many reasons document.write is a terrible API.
>
> 3) document.write and innerHTML are pretty hacky interfaces. Rather than
> trying to shore them up, we should instead recommend JavaScript libraries
> that work at a higher level and end up using DOM APIs. That's likely to be a
> lot sounder.

I agree that document.write{,ln} is a bad interface.
I'm trying to come up with ways to let developers migrate piecemeal to
APIs that are at least as performant where bugs don't have horrible
security consequences.

Fair enough.  I suppose if a system wants to preserve the security property that

    literal portions in structured interpolations passed to
document.{write,writeln}
    are always interpreted as they would be if all substitutions contained only
    whitesace.

then I can preserve that by wrapping document.write{,ln} to throw an
Error if passed something that ends inside a tag, comment, or ends
with an HTML special character, or that contains a CDATA or RCDATA
element that is not closed in the same chunk.

That is failsafe preserving, and the grammar I would need to check
should be regular as long as I can strip out noscript and friends,
NULs, and other questionable bits, since CDATA and RCDATA elements
don't nest.

Thanks for answering all the questions.  I think I can probably do
without new HTML5 stuff.

> Regards,
> Maciej
>
>

Received on Thursday, 11 March 2010 10:18:15 UTC