W3C home > Mailing lists > Public > public-script-coord@w3.org > January to March 2013

Re: E4H and constructing DOMs

From: Adam Barth <w3c@adambarth.com>
Date: Thu, 7 Mar 2013 18:15:26 -0800
Message-ID: <CAJE5ia_6LQ3WrU7bsa-5g-9sS9X-Ece8mHOgNK4-YEN278WEGA@mail.gmail.com>
To: mikesamuel@gmail.com
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
On Thu, Mar 7, 2013 at 5:54 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
> [Resending as I dropped CC]
>
> 2013/3/7 Adam Barth <w3c@adambarth.com>:
>> I don't think I fully understood your message because it was quite
>> long and contained many complex external references.  What I've
>> understood you to say is that you've managed to work around the
>> limitations of the current string-based template design by building a
>> complex mechanism for automatically escaping untrusted data.
>
> I designed the current string-based template design to interface well
> with a simple grammar driven approach.

It's still a thousand lines of JavaScript and includes subtle regular
expressions.  If we're asking authors to write something that complex
to use ECMAScript templates safely, we've failed.

>> Rather than forcing authors to layer complex (and therefore
>> error-prone) systems on top of a string-based template system, we
>> should instead provide authors with an AST-based template system that
>> avoids these security pitfalls.
>
> Did you read my critique of AST-based template systems?

Yes.

> The DOM approach suffers several drawback
> 1. It's resistant to XSS but not robust since it doesn't deal with
> embedded languages.  It trivially fails when substitutions appear
> inside URI attributes, or text nodes inside a script or style
> attribute.

Using string-based templates doesn't solve that problem.  It just
means you also need to deal with those sorts of issues for the
top-level language.

> 2. It's tied to a particular language.  If we wouldn't introduce new
> syntax specifically for SQL prepared statements, we shouldn't do it
> for the HTML equivalent and instead come up with a single syntactic
> construct that allows safe composition in any language.

There's no reason it needs to be tied to a particular language.  The
"A" in AST stands for abstract after all.

> 3. It fails the ubiquitous <header><body><footer> pattern as described
> at https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html

I didn't read all 23 pages of safetemplate.html, but many, many people
happily use AST-based templating systems.  Haml alone has a massive
following.

On Thu, Mar 7, 2013 at 5:55 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
> 2013/3/7 Adam Barth <w3c@adambarth.com>:
>> On Thu, Mar 7, 2013 at 5:18 PM, Adam Barth <w3c@adambarth.com> wrote:
>>> I don't think I fully understood your message because it was quite
>>> long and contained many complex external references.  What I've
>>> understood you to say is that you've managed to work around the
>>> limitations of the current string-based template design by building a
>>> complex mechanism for automatically escaping untrusted data.
>>
>> As an example, in browsing the source code of the autoescaping code
>> you referenced, I found the following line:
>>
>> var HTML_TAG_REGEX_ = /<(?:!|\/?[a-z])(?:[^>'"]|"[^"]*"|'[^']*')*>/gi;
>>
>> As famously written on Stack Overflow [1], "Regex is not a tool that
>> can be used to correctly parse HTML."
>
> That doesn't apply since this is not parsing, it is lexing, and
> regular expressions can be used to lex HTML.

My point is just that if we're expecting authors to write regular
expressions to lex HTML, we've failed.

>> In any case, we shouldn't require folks to write a thousand lines of
>> JavaScript to use ECMAScript templates to safely produce HTML.  That's
>> a clear signal that we should revisit the design of the template
>> system.
>
> I'm not proposing that, so it's not a reason to revisit the design of
> the template system.
>
> I'm proposing a design that allows library authors (eventually grammar
> maintainers) to write contextual auto-escaping systems instead of
> requiring template system authors to write thousands of lines of AST
> code that doesn't solve the problem because the DOM is wedded to
> DOMstring for attribute values.

Linking to a thousand-line JavaScript library as evidence that string
template can be used securely pretty much proves my point: it's hard
to use string templates securely.  That means that most authors won't
use them securely and will write code that's full of XSS.

Adam
Received on Friday, 8 March 2013 02:16:26 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 8 May 2013 19:30:09 UTC