- From: Jonas Sicking <jonas@sicking.cc>
- Date: Thu, 1 Nov 2012 16:44:01 +0100
- To: Adam Barth <w3c@adambarth.com>
- Cc: Maciej Stachowiak <mjs@apple.com>, Anne van Kesteren <annevk@annevk.nl>, "public-webapps@w3.org WG" <public-webapps@w3.org>
On Thu, Nov 1, 2012 at 3:14 PM, Adam Barth <w3c@adambarth.com> wrote: > > > > On Thu, Nov 1, 2012 at 6:33 AM, Maciej Stachowiak <mjs@apple.com> wrote: >> >> >> On Nov 1, 2012, at 1:57 PM, Adam Barth <w3c@adambarth.com> wrote: >> >>> >> >>> (5) The nested template fragment parser operates like the template >>> fragment parser, but with the following additional difference: >>> (a) When a close tag named "+script" is encountered which does not >>> match any currently open script tag: >> >> >> Let me try to understand what you've written here concretely: >> >> 1) We need to change the "end tag open" state to somehow recognize >> "</+script>" as an end tag rather than as a bogus comment. >> 2) When the tree builder encounter such an end tag in the ???? state(s), >> we execute the substeps you've outlined below. >> >> The problem with this approach is that nested templates parse differently >> than top-level templates. Consider the following example: >> >> <script type=template> >> <b >> </script> >> >> In this case, none of the nested template parser modifications apply and >> we'll parse this as normal for HTML. That means the contents of the >> template will be "<b" (let's ignore whitespace for simplicity). >> >> <script type=template> >> <h1>Inbox</h1> >> <script type=template> >> <b >> </+script> >> </script> >> >> Unfortunately, the nested template in this example parses differently than >> it did when it was a top-level template. The problem is that the characters >> "</+script>" are not recognized by the tokenizer as an end tag because they >> are encountered by the nested template fragment parser in the "before >> attribute name" state. That means they get treated as some sort of bogus >> attributes of the <b> tag rather than as an end tag. >> >> >> OK. Do you believe this to be a serious problem? I feel like inconsistency >> in the case of a malformed tag is not a very important problem, but perhaps >> there are cases that would be more obviously problematic, or reasons not >> obvious to me to be very concerned about cases exactly like this one. > > > It's going to lead to subtle parsing bugs in web sites, which usually means > security vulnerabilities. :( > >> Also: can you think of a way to fix this problem? Or alternately, do you >> believe it's fundamentally not fixable? I've only spent a short amount of >> time thinking about this approach, and I am not nearly as much an expert on >> HTML parsing as you are. > > > I definitely see the appeal of trying to re-use <script> for templates. > Unfortunately, I couldn't figure out how to make it work sensibly with > nested templates, which is why I ended up recommending that we use the > <template> element. > > Another approach we considered was to separate out the "hide from legacy > user agents" and the "define a template" operations. That approach pushes > you towards a design like > > <xmp> > <template> > <h1>Inbox</h1> > <template> > <h2>Folder</h2> > </template> > </template> > </xmp> > > You could do the same thing with <script type=something>, but <xmp> is > shorter (and currently unused). This approach has a bunch of disadvantages, > including being verbose and having some unexpected parsing: > > <xmp> > <template> > <div data-foo="<xmp>bar</xmp>"> > This text is actually outside the template! > </div> > </template> > </xmp> Given how rarely <xmp> is used on the web, especially in comparison with <script>, this seems like it could be an acceptable way to deal with legacy UAs. / Jonas
Received on Thursday, 1 November 2012 15:45:03 UTC