Re: [webcomponents] More backward-compatible templates

From: Adam Barth
Date: Thu, 1 Nov 2012 07:14:22 -0700
Message-ID: <CAJE5ia_oFdMZu5NZb=P12tU2S0PuKJyShgW_nOyuH+xDQp7Xiw@mail.gmail.com>
To: Maciej Stachowiak
Cc: Anne van Kesteren, "public-webapps@w3.org WG"
On Thu, Nov 1, 2012 at 6:33 AM, Maciej Stachowiak <mjs@apple.com> wrote:

> On Nov 1, 2012, at 1:57 PM, Adam Barth <w3c@adambarth.com> wrote:
> (5) The nested template fragment parser operates like the template
>> fragment parser, but with the following additional difference:
>>      (a) When a close tag named "+script" is encountered which does not
>> match any currently open script tag:
> Let me try to understand what you've written here concretely:
> 1) We need to change the "end tag open" state to somehow recognize
> "</+script>" as an end tag rather than as a bogus comment.
> 2) When the tree builder encounter such an end tag in the ???? state(s),
> we execute the substeps you've outlined below.
> The problem with this approach is that nested templates parse differently
> than top-level templates.  Consider the following example:
> <script type=template>
>  <b
> </script>
> In this case, none of the nested template parser modifications apply and
> we'll parse this as normal for HTML.  That means the contents of the
> template will be "<b" (let's ignore whitespace for simplicity).
> <script type=template>
>   <h1>Inbox</h1>
>   <script type=template>
>     <b
>   </+script>
>  </script>
> Unfortunately, the nested template in this example parses differently than
> it did when it was a top-level template.  The problem is that the
> characters "</+script>" are not recognized by the tokenizer as an end tag
> because they are encountered by the nested template fragment parser in the
> "before attribute name" state.  That means they get treated as some sort of
> bogus attributes of the <b> tag rather than as an end tag.
> OK. Do you believe this to be a serious problem? I feel like inconsistency
> in the case of a malformed tag is not a very important problem, but perhaps
> there are cases that would be more obviously problematic, or reasons not
> obvious to me to be very concerned about cases exactly like this one.

It's going to lead to subtle parsing bugs in web sites, which usually means
security vulnerabilities.  :(

Also: can you think of a way to fix this problem? Or alternately, do you
> believe it's fundamentally not fixable? I've only spent a short amount of
> time thinking about this approach, and I am not nearly as much an expert on
> HTML parsing as you are.

I definitely see the appeal of trying to re-use <script> for templates.
 Unfortunately, I couldn't figure out how to make it work sensibly with
nested templates, which is why I ended up recommending that we use the
<template> element.

Another approach we considered was to separate out the "hide from legacy
user agents" and the "define a template" operations.  That approach pushes
you towards a design like


You could do the same thing with <script type=something>, but <xmp> is
shorter (and currently unused).  This approach has a bunch of
disadvantages, including being verbose and having some unexpected parsing:

    <div data-foo="<xmp>bar</xmp>">
      This text is actually outside the template!

The <script type=template> has similar problems, of course:

<script type=template>
  <div data-foo="<script>bar</script>">
    This text is actually outside the template!

Perhaps developers have a clearer understanding of such problems from
having to escape </script> in JavaScript?

All this goofiness eventually convinced me that if we want to support
nested templates, we ought to use the usual nesting mechanics of HTML,
which leads to a design like <template> that nests like a normal tag.

          (a.i) Consume the token for the close tag named "+script".
>>          (a.ii) Crate a DocumentFragment containing that parsed contents
>> of the fragment.
>>          (a.iii) [return to the parent template fragment parser] with the
>> result of step (a.ii) with the parent parser to resume after the "+script"
>> close tag.
>> This is pretty rough and I'm sure I got some details wrong. But I believe
>> it demonstrates the following properties:
>> (B) Allows for perfect fidelity polyfills, because it will manifestly end
>> the template in the same place that an unaware browser would close the
>> <script> element.
>> (C) Does not require multiple levels of escaping.
>> (A) Can be implemented without changes to the core HTML parser (though
>> you'd need to introduce a new fragment parsing mode).
> I suspect we're quibbling over "no true Scotsman" semantics here, but you
> obviously need to modify both the HTML tokenizer and tree builder for this
> approach to work.
> In principle you could create a whole separate tokenizer and tree builder.
> But obviously that would probably be a poor choice for a native
> implementation compared to adding some flags and variable behavior. I'm not
> even necessarily claiming that all the above properties are advantages, I
> just wanted to show that there need not be a multi-escapting problem nor
> necessarily scary complicated changes to the tokenizer states for <script>.
> I think the biggest advantage to this kind of approach is that it can be
> polyfilled with full fidelity. But I am in no way wedded to this solution
> and I am intrigued at the mention of other approaches with this property.
> The others I know of (external source only, srcdoc like on iframe) seem
> clearly worse, but there might be other bigger ones.

The xmp-like wrapper also can be polyfilled, as can approaches based on
HTML comments or attributes.  There's a trade-off, however.  In the long
view, it's not clear to me how important polyfillability is for a feature.
 It certainly makes adoption easier in the short term, but if we constrain
ourselves to designing only features that can be polyfilled at each step,
we'll end up with a contorted platform.

>  (D) Can be implemented with near-identical behavior for XHTML, except
>> that you'd need an XML fragment parser.
> The downside is that nested templates don't parse the same as top-level
> templates.
> Indeed. That is in addition to the previously conceded downsides that the
> syntax is somewhat less congenial.
> Another issue is that you've also introduced the following security risk:
> Today, the following line of JavaScript is safe to include in an inline
> script tag:
> var x = "</+script><img onerror=alert(1)>";
> Because that line does not contain "</script>", the string "alert(1)" will
> be treated as the contents of a string.  However, if that line is included
> in an inline script inside of a template, the modifications of to the
> parser above will mean that alert(1) will execute as JavaScript rather than
> being treated as a string, introducing an XSS vector.
> I don't follow. Can you give a full example of how this would be included
> in a template and therefore be executed?

x =  "</+script><img onerror=alert(1)>"; // This is safe, there is no
script execution.

<script type=template id=a>
    x =  "</+script><img onerror=alert(1)>"; // This is not safe, the
alert(1) executes as script.


You should imagine, of course, the string not being written literally by
the developer but instead generated on the server side by some code that
knows how to escape strings for use in inline script tags (e.g., by
escaping "\" and "</script").  It's certainly possible to defend against
this XSS vector on the server, but it's one more XSS vector to worry about.

 I hope this clarifies the proposal.
>> Notes:
>> - Just because it's described this way doesn't mean it has to be
>> implemented this way - implementations could do template parsing in a
>> single pass with HTML parsing if desired. I wrote it this way mainly to
>> demonstrate the desired properties/
> I'm not sure how we'd be able to that without running multiple copies of
> the tokenizer state machine in parallel.  The tokenizer states for the
> template fragment parser aren't going to line up in any meaningful way with
> the top-level tokenizer's search for an appropriate end tag to escape from
> the script data states.
> I'm pretty confident it's *possible* to do a one-pass version of this
> algorithm, but I am not sure if it is easy, or if it is desirable.
> What I actually imagined (knowing much less about HTML parsing than you)
> was that you'd enter a different tokenizer state after encountering a
> <script template> than a <script>. But defining that state would be
> challenging.

Sure, but you're going to need a 17x bigger tokenizer state machine because
you'll need to track all 17 script data states for the top-level tokenizer
at the same time as you're tracking all the states for the template
fragment tokenzier.

