W3C home > Mailing lists > Public > public-script-coord@w3.org > January to March 2013

Re: E4H and constructing DOMs

From: Mike Samuel <mikesamuel@gmail.com>
Date: Thu, 7 Mar 2013 23:36:13 -0500
Message-ID: <CACod6GsYd1GYuV5dHSZEdo0aF5xp8hQg+CyNnPTHso0UpKam0Q@mail.gmail.com>
To: Adam Barth <w3c@adambarth.com>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
I think we're talking past each other, so maybe we should step back a bit.

I talk about different kinds of developers (library authors,
application authors) writing code and you say things that suggest to
me that you think the bulk of web developers are going to be writing
large amounts of security-critical code.
In your view, who is writing what code with the string templates approach?

Under the AST model, who is writing what code?  What portion of an AST
approach needs to involve spec-producing committees?

What is your exemplar of the AST model (if not Yesod) and what is your
plan to cause the bulk of web programmers to do things using an AST
approach instead of using ad-hoc string approaches?



2013/3/7 Adam Barth <w3c@adambarth.com>:
> On Thu, Mar 7, 2013 at 5:54 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
>> [Resending as I dropped CC]
>>
>> 2013/3/7 Adam Barth <w3c@adambarth.com>:
>>> I don't think I fully understood your message because it was quite
>>> long and contained many complex external references.  What I've
>>> understood you to say is that you've managed to work around the
>>> limitations of the current string-based template design by building a
>>> complex mechanism for automatically escaping untrusted data.
>>
>> I designed the current string-based template design to interface well
>> with a simple grammar driven approach.
>
> It's still a thousand lines of JavaScript and includes subtle regular
> expressions.  If we're asking authors to write something that complex
> to use ECMAScript templates safely, we've failed.
>
>>> Rather than forcing authors to layer complex (and therefore
>>> error-prone) systems on top of a string-based template system, we
>>> should instead provide authors with an AST-based template system that
>>> avoids these security pitfalls.
>>
>> Did you read my critique of AST-based template systems?
>
> Yes.
>
>> The DOM approach suffers several drawback
>> 1. It's resistant to XSS but not robust since it doesn't deal with
>> embedded languages.  It trivially fails when substitutions appear
>> inside URI attributes, or text nodes inside a script or style
>> attribute.
>
> Using string-based templates doesn't solve that problem.  It just
> means you also need to deal with those sorts of issues for the
> top-level language.
>
>> 2. It's tied to a particular language.  If we wouldn't introduce new
>> syntax specifically for SQL prepared statements, we shouldn't do it
>> for the HTML equivalent and instead come up with a single syntactic
>> construct that allows safe composition in any language.
>
> There's no reason it needs to be tied to a particular language.  The
> "A" in AST stands for abstract after all.
>
>> 3. It fails the ubiquitous <header><body><footer> pattern as described
>> at https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html
>
> I didn't read all 23 pages of safetemplate.html, but many, many people
> happily use AST-based templating systems.  Haml alone has a massive
> following.
>
> On Thu, Mar 7, 2013 at 5:55 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
>> 2013/3/7 Adam Barth <w3c@adambarth.com>:
>>> On Thu, Mar 7, 2013 at 5:18 PM, Adam Barth <w3c@adambarth.com> wrote:
>>>> I don't think I fully understood your message because it was quite
>>>> long and contained many complex external references.  What I've
>>>> understood you to say is that you've managed to work around the
>>>> limitations of the current string-based template design by building a
>>>> complex mechanism for automatically escaping untrusted data.
>>>
>>> As an example, in browsing the source code of the autoescaping code
>>> you referenced, I found the following line:
>>>
>>> var HTML_TAG_REGEX_ = /<(?:!|\/?[a-z])(?:[^>'"]|"[^"]*"|'[^']*')*>/gi;
>>>
>>> As famously written on Stack Overflow [1], "Regex is not a tool that
>>> can be used to correctly parse HTML."
>>
>> That doesn't apply since this is not parsing, it is lexing, and
>> regular expressions can be used to lex HTML.
>
> My point is just that if we're expecting authors to write regular
> expressions to lex HTML, we've failed.
>
>>> In any case, we shouldn't require folks to write a thousand lines of
>>> JavaScript to use ECMAScript templates to safely produce HTML.  That's
>>> a clear signal that we should revisit the design of the template
>>> system.
>>
>> I'm not proposing that, so it's not a reason to revisit the design of
>> the template system.
>>
>> I'm proposing a design that allows library authors (eventually grammar
>> maintainers) to write contextual auto-escaping systems instead of
>> requiring template system authors to write thousands of lines of AST
>> code that doesn't solve the problem because the DOM is wedded to
>> DOMstring for attribute values.
>
> Linking to a thousand-line JavaScript library as evidence that string
> template can be used securely pretty much proves my point: it's hard
> to use string templates securely.  That means that most authors won't
> use them securely and will write code that's full of XSS.
>
> Adam
Received on Friday, 8 March 2013 04:36:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 8 May 2013 19:30:09 UTC