Re: E4H and constructing DOMs from Adam Barth on 2013-03-08 (public-script-coord@w3.org from January to March 2013)

From: Adam Barth <w3c@adambarth.com>
Date: Fri, 8 Mar 2013 09:57:43 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Anne van Kesteren <annevk@annevk.nl>, Rick Waldron <waldron.rick@gmail.com>, Adam Klein <adamk@chromium.org>, Ojan Vafai <ojan@chromium.org>, Brendan Eich <brendan@secure.meer.net>, Ian Hickson <ian@hixie.ch>, "rafaelw@chromium.org" <rafaelw@chromium.org>, Alex Russell <slightlyoff@chromium.org>, "public-script-coord@w3.org" <public-script-coord@w3.org>, "Mark S. Miller" <erights@google.com>
Message-ID: <CAJE5ia87Po5k8cRF5cN1P_9zMKOb7hRBJSZ6VXU-PFrKom45EQ@mail.gmail.com>
tl;dr: No one is disputing that string templates as currently designed
are insecure by default and will lead authors to write code filled
with XSS vulnerabilities.  I recommend removing string templates for
the spec until these security issues are resolved.

(Consolidating replies---responses inline.)

On Thu, Mar 7, 2013 at 6:36 PM, Rick Waldron <waldron.rick@gmail.com> wrote:
> On Thu, Mar 7, 2013 at 9:15 PM, Adam Barth <w3c@adambarth.com> wrote:
>> Linking to a thousand-line JavaScript library as evidence that string
>> template can be used securely pretty much proves my point: it's hard
>> to use string templates securely.  That means that most authors won't
>> use them securely and will write code that's full of XSS.
>
> I'd like to kindly ask that you stop approaching this conversation as though
> browsers and the web are the only client of the EcmaScript specification.
> The language serves to provide primitives that can be used to compose higher
> level abstractions, eg. DOM APIs with whatever level of security the domain
> problem requires.

That's a nice strawman, but I'm not approaching this conversation as
through browsers were the only clients of ECMAScript.  What I'm saying
is that the current design is insecure when used in browsers and
because browsers are a large user of ECMAScript, we shouldn't include
a language feature that gives web authors a giant security footgun.

On Thu, Mar 7, 2013 at 7:40 PM, Mark S. Miller <erights@google.com> wrote:
> Hi Ian, this seems a misunderstanding or non-sequitur. Mike and Rick's point
> is not to compromise, it is to do something solid and general purpose, to
> avoid injection bugs in a variety of DSL scenarios, not just HTML. Even in
> the browser, JS is sometimes used to compose SQL that is sent to the server.
> It isn't the browser's business to understand SQL, but we can provide a
> mechanism that is as useful for SQL, again, without compromise.

String templates, as currently designed, are bad for constructing SQL
statements too.  When used in their default mode (which is the most
common way that authors will use them), they lead to SQL injection
vulnerabilities.  Instead, we should use an approach analogous to
prepared statements, which are much less likely to lead to SQL
injection.

> Adam, I think you miss the point of Mike's message rather completely. This
> thousand line JS library has to be done for HTML once, not once per usage.
> It is complicated because HTML is complicated. And the amount of code
> compares quite favorably to the browser's HTML implementation, which is much
> more security critical than this. In any case, if the HTML quasi-parser is
> provided by the browser platform as standard equipment, it can probably
> reuse some of the browser's existing mechanisms, to help keep these two HTML
> systems in sync.

It doesn't matter how many times the library needs to be authored or
by whom.  If we need a thousand lines of JavaScript to compensante for
the by-design insecurity of string templates, then we've failed as
language designers.  Instead, ECMAScript should have a templating
system that is secure-by-design and by default instead of
insecure-by-design-and-default-but-can-be-patched-with-a-thousand-line-library.

> As for whether the output of the HTML quasi-parser is an AST or an encoded
> string, that is up to the quasi-parser designer. The quasi-literals in E
> generally generated ASTs. Mike convinced me he can generate encoded strings
> directly as safely and faster, if the point is to eventually produce an
> encoded string. I'm happy either way. Both decisions are perfectly
> compatible with the design on quasis, er, template strings, as speced in
> draft ES6.

That's nice, but the default mode for string templates works for HTML
but is insecure.  That means authors will write code filled with XSS
because they'll just use the default mode.

What you've written in this paragraph is even more scary.  You're
saying that string templates are so poorly designed that they guided
you, a world-renowned security expert, into using an extremely complex
(and therefore unlikely to be secure) design.  Surely authors who are
not world-renowned security experts will fare even worse.

On Thu, Mar 7, 2013 at 7:57 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Thu, Mar 7, 2013 at 5:55 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
>> That doesn't apply since this is not parsing, it is lexing, and
>> regular expressions can be used to lex HTML.
>
> Actually, no you can't. For example the lexing of contents of <script>
> elements is quite complex.

It's mathematically impossible.  You need a stack to keep track of the
foreign content mode (i.e., whether we're tokenizing HTML, SVG, or
MathML).  Without that information, you can't tell who the tokenizer
will parse apparent CDATA sections.

On Thu, Mar 7, 2013 at 8:36 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
> I talk about different kinds of developers (library authors,
> application authors) writing code and you say things that suggest to
> me that you think the bulk of web developers are going to be writing
> large amounts of security-critical code.
> In your view, who is writing what code with the string templates approach?

It doesn't matter who writes the thousand-line library.  The fact that
you need a thousand-line library to use string templates securely
(even assuming that the library is correct!) demonstrates that the
design itself is insecure and should not be part of ECMAScript.
Instead, we should design a templating system that doesn't need a
thousand-line library to be used securely.

> Under the AST model, who is writing what code?  What portion of an AST
> approach needs to involve spec-producing committees?

I'm not advocating E4H, but as an example, in E4H no one needs to
write a thousand-line library.  The spec itself is two printed pages:

http://www.hixie.ch/specs/e4h/strawman

I'm not claiming that E4H is secure is all cases.  I'm just claiming
that the "hello, world" template is secure by default.  For string
templates, the "hello, world" template is XSS.

> What is your exemplar of the AST model (if not Yesod) and what is your
> plan to cause the bulk of web programmers to do things using an AST
> approach instead of using ad-hoc string approaches?

Personally, my favorite AST-style template system is Haml because the
templates themselves are beautiful.  I don't think we should include
Haml in EMCAScript as-such because Haml has a bunch of Ruby-ism (e.g.,
self-quoting strings for attribute names).  I believe we could come up
with something similar to Haml that felt like a natural extension of
ECMAScript.

The above paragraph is somewhat off topic.  At the moment, I'm arguing
that we should remove string templates from the spec because they are
insecure.  Once we do that, we can have a discussion about what to
replace them with.  (I imagine that discussion will take a fair bit of
time since there are many details that people will want to bikeshed.)

On Thu, Mar 7, 2013 at 9:59 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> I thought that one of the points with quasis was that they would allow the
> above to be interpreted such that firstName and lastName was inserted as
> text content. I.e. the quasi handler could avoid parsing the contents of
> those values as HTML and instead just inset them as text content.

In my example, I used string templates (aka quasis) in their default
mode, which is insecure, hence my claim that string templates are
insecure by default.  Mike Samuel claims that he has written a
safeHTML quasis handler, which takes about a thousand lines of
JavaScript, hence my claim that string templates are difficult to use
securely.

> This would mean that the HTML quasi would by default be resilient against
> HTML-injection.

Even if we had a secure HTML quasi handler, the HTML quasi handler
would not be the default handler.  That means the templating system is
insecure by default.

> To supplement this behavior we could allow the quasi to take special values
> which would be passed to the HTML parser "like normal" and thus be parsed.
> I.e. something like
>
> HTML`<h1>Welcome ${ asUnsafeHTML(firstName) } ${ lastName }!</h1>`
>
> In this case the asUnsafeHTML function would return an object which was
> recognized by the HTML quasi as "should be parsed" and would contain a
> property which holds the string value passed in the first argument.
>
> Since no parsing would take effect at the asUnsafeHTML callsite, and instead
> would happen while the rest of the quasi was parsed, all of the normal
> contextual parsing rules would apply.
>
> This way the quasi should by default be as safe as an AST template system,
> while allowing the page to opt in to more feature full, less safe
> templating.
>
> We could even provide functions like asSafeHTML which would trigger the
> quasi to parse that piece of content using rules that prevent only "safe"
> elements.

None of the above solves the problem that string templates as
currently designed are insecure by default and will lead authors to
write code filled with XSS vulnerabilities.

Adam
Received on Friday, 8 March 2013 17:58:47 UTC