Re: [webcomponents] HTML Parsing and the <template> element from Yehuda Katz on 2012-04-04 (public-webapps@w3.org from April to June 2012)

From: Yehuda Katz <wycats@gmail.com>
Date: Wed, 4 Apr 2012 16:10:49 -0400
To: Dimitri Glazkov <dglazkov@chromium.org>
Cc: public-webapps <public-webapps@w3.org>, Henri Sivonen <hsivonen@iki.fi>, Adam Barth <w3c@adambarth.com>, Ian Hickson <ian@hixie.ch>, Rafael Weinstein <rafaelw@google.com>
Message-ID: <CAMFeDTX2aRPfqWn5KfbQw7xd+DyszqNyH5WiooP1Uu3Y4yii4A@mail.gmail.com>

I just wanted to weigh in in favor of an inert-only <template> tag.

Today, a lot of libraries use <script> tags with an an inert mime type to
store templated information. In many cases, this content represents an HTML
subtree that would not be valid in the parent context (for example, they
may contain <tr> elements at the root).

In general, people use something like jQuery's regex-based algorithm to
insert the string in a container ("String starting with <tr> should be
inserted inside a <table><tbody>...</tbody></table>") and extract back out
the nodes, but as you are starting to see, this entire process is pretty
contorted.

I am in favor of a <template> tag that would do away with these contortions
for many cases, even if it is limited to insert contents.

Yehuda Katz
(ph) 718.877.1325


On Wed, Feb 8, 2012 at 5:00 PM, Dimitri Glazkov <dglazkov@chromium.org>wrote:

> Hello folks!
>
> You may be familiar with the work around the <template> element, or a
> way to declare document fragments in HTML (see
>
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-November/033868.html
> for some background).
>
> In trying to understand how this newfangled beast would work, I
> started researching HTML parsing, and--oh boy was I ever sorry! Err..
> I mean.. --and investigating how the contents of the <template>
> element could be parsed.
>
> So far, I have two ideas. Both introduce changes to HTML parsing
> algorithm. Both have flaws, and I thought the best thing to do would
> be to share the data with the experts and seek their opinions. Those
> of you cc'd -- you're the designated experts :)
>
> == IDEA 1: Keep template contents parsing in the tokenizer ==
>
> PRO: if we could come up with a way to perceive the stuff between
> <template> and </template> as a character stream, we enable a set of
> use cases where the template contents does not need to be a complete
> HTML subtree. For example, I could define a template that sets up a
> start of a table, then a few that provide repetition patterns for
> rows/cells, and then one to close out a table:
>
> <template id="head"><table><caption>Nyan-nyan</caption><thead> ...
> <tbody></template>
> <template id="row"><tr><template><td> ... </td></template></tr></template>
> <template id="foot"></tbody></table></template>
>
> Then I could slam these templates together with some API and produce
> an arbitrary set of tables.
>
> PRO: Since the template contents are parsed as string, we create
> opportunities for performance optimizations at the UA level. If a
> bunch of templates is declared, but only a handful is used, we could
> parse template contents on demand, thus reducing the churn of DOM
> elements.
>
> CON: Tokenizer needs to be really smart and will start looking a lot
> like a specialized parser. At first glance, <template> behaves much
> like a <textarea> -- any tags inside will just be treated as
> characters. It works until you realize that templates sometimes need
> to be nested. Any use case that involves building a
> larger-than-one-dimensional data representation (like tables) will
> involve nested templates. This makes things rather tricky. I made an
> attempt of sketching this out here:
>
> http://dvcs.w3.org/hg/webcomponents/raw-file/a28e16cc4167/spec/templates/index.html#parsing
> .
> As you can see, this adds a largish set of new states to tokenizer.
> And it is still incomplete, breaking in cases like
> <template><script>alert('<template> is
> awesome!');</script></template>.
>
> It could be argued that--while pursuing the tokenizer algorithm
> perfection--we could just stop at some point of complexity and issue a
> stern warning for developers to not get too crazy, because stuff will
> break -- akin to including "</script>" string in your Javascript code.
>
> == IDEA 2: Just tweak insertion modes ==
>
> PRO: It's a lot less intrusive to the parser -- just adjust insertion
> modes to allow <template> tags in places where they would ordinary be
> ignored or foster-parented, and add a new insertion for template
> contents to let all tags in. I made a quick sketch here:
>
> http://dvcs.w3.org/hg/webcomponents/raw-file/c96f051ca008/spec/templates/index.html#parsing
> (Note: more massaging is needed to make it really work)
>
> CON: You can't address fun partial-tree scenarios.
>
> Which idea appeals to you? Perhaps you have better ideas? Please share.
>
> :DG<
>
>

Received on Wednesday, 4 April 2012 20:11:38 UTC