Re: [webcomponents] More backward-compatible templates from Jonas Sicking on 2012-11-01 (public-webapps@w3.org from October to December 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Thu, 1 Nov 2012 16:44:01 +0100
To: Adam Barth <w3c@adambarth.com>
Cc: Maciej Stachowiak <mjs@apple.com>, Anne van Kesteren <annevk@annevk.nl>, "public-webapps@w3.org WG" <public-webapps@w3.org>
Message-ID: <CA+c2ei-qUnO=pdjktpUmk7H2TtXFKK9RxBZMY9pXZHZGjTupWg@mail.gmail.com>

On Thu, Nov 1, 2012 at 3:14 PM, Adam Barth <w3c@adambarth.com> wrote:
>
>
>
> On Thu, Nov 1, 2012 at 6:33 AM, Maciej Stachowiak <mjs@apple.com> wrote:
>>
>>
>> On Nov 1, 2012, at 1:57 PM, Adam Barth <w3c@adambarth.com> wrote:
>>
>>>
>>
>>> (5) The nested template fragment parser operates like the template
>>> fragment parser, but with the following additional difference:
>>>      (a) When a close tag named "+script" is encountered which does not
>>> match any currently open script tag:
>>
>>
>> Let me try to understand what you've written here concretely:
>>
>> 1) We need to change the "end tag open" state to somehow recognize
>> "</+script>" as an end tag rather than as a bogus comment.
>> 2) When the tree builder encounter such an end tag in the ???? state(s),
>> we execute the substeps you've outlined below.
>>
>> The problem with this approach is that nested templates parse differently
>> than top-level templates.  Consider the following example:
>>
>> <script type=template>
>>  <b
>> </script>
>>
>> In this case, none of the nested template parser modifications apply and
>> we'll parse this as normal for HTML.  That means the contents of the
>> template will be "<b" (let's ignore whitespace for simplicity).
>>
>> <script type=template>
>>   <h1>Inbox</h1>
>>   <script type=template>
>>     <b
>>   </+script>
>> </script>
>>
>> Unfortunately, the nested template in this example parses differently than
>> it did when it was a top-level template.  The problem is that the characters
>> "</+script>" are not recognized by the tokenizer as an end tag because they
>> are encountered by the nested template fragment parser in the "before
>> attribute name" state.  That means they get treated as some sort of bogus
>> attributes of the <b> tag rather than as an end tag.
>>
>>
>> OK. Do you believe this to be a serious problem? I feel like inconsistency
>> in the case of a malformed tag is not a very important problem, but perhaps
>> there are cases that would be more obviously problematic, or reasons not
>> obvious to me to be very concerned about cases exactly like this one.
>
>
> It's going to lead to subtle parsing bugs in web sites, which usually means
> security vulnerabilities.  :(
>
>> Also: can you think of a way to fix this problem? Or alternately, do you
>> believe it's fundamentally not fixable? I've only spent a short amount of
>> time thinking about this approach, and I am not nearly as much an expert on
>> HTML parsing as you are.
>
>
> I definitely see the appeal of trying to re-use <script> for templates.
> Unfortunately, I couldn't figure out how to make it work sensibly with
> nested templates, which is why I ended up recommending that we use the
> <template> element.
>
> Another approach we considered was to separate out the "hide from legacy
> user agents" and the "define a template" operations.  That approach pushes
> you towards a design like
>
> <xmp>
>   <template>
>     <h1>Inbox</h1>
>     <template>
>       <h2>Folder</h2>
>     </template>
>   </template>
> </xmp>
>
> You could do the same thing with <script type=something>, but <xmp> is
> shorter (and currently unused).  This approach has a bunch of disadvantages,
> including being verbose and having some unexpected parsing:
>
> <xmp>
>   <template>
>     <div data-foo="<xmp>bar</xmp>">
>       This text is actually outside the template!
>     </div>
>   </template>
> </xmp>

Given how rarely <xmp> is used on the web, especially in comparison
with <script>, this seems like it could be an acceptable way to deal
with legacy UAs.

/ Jonas

Received on Thursday, 1 November 2012 15:45:03 UTC