Re: [webcomponents] HTML Parsing and the <template> element from Rafael Weinstein on 2012-02-08 (public-webapps@w3.org from January to March 2012)

From: Rafael Weinstein <rafaelw@google.com>
Date: Wed, 8 Feb 2012 14:50:15 -0800
To: Adam Barth <w3c@adambarth.com>
Cc: Dimitri Glazkov <dglazkov@chromium.org>, public-webapps <public-webapps@w3.org>, Henri Sivonen <hsivonen@iki.fi>, Ian Hickson <ian@hixie.ch>
Message-ID: <CABMdHiTRRN8M0YPi0=3vRiKD2V36Zsmh6rwHe4DdNt40B7sVsw@mail.gmail.com>
[This time from the right email]

On Wed, Feb 8, 2012 at 2:10 PM, Adam Barth <w3c@adambarth.com> wrote:
> Re-using the generic raw text element parsing algorithm would be the
> simplest change to the parser.  Do you have a concrete example of
> where nested <template> declarations are required?  For example,
> rather than including nested templates, you might instead consider
> referencing other template elements by id.

Referencing templates rather than including sub-templates inline is
certainly a solution. In fact, it's a common feature of templating
systems. It's useful when a single component is used in multiple
disparate or random places throughout the page.

However, it's worth backing up here and thinking about what templating is.

Templating is about convenience and maintainability of pages  -- Not
about any core capability. Templating is useful and near ubiquitous
because it makes it easy to think about authoring your page.

Web pages are highly complex and often deeply nested repeating tree
structures. You can certainly de-construct the page into some sort of
templating-4th-normal-form and dump each "component" at the top level
of the document.

However, doing this abandons the largely coherent structure of
template, e.g. where table rows are defined in the context of the
table in which they are used, etc... -- which is sort of the idea of
templating -- that you get to describe your page in more or less the
way that it will be rendered.

>
> Adam
>
>
> On Wed, Feb 8, 2012 at 2:00 PM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
>> Hello folks!
>>
>> You may be familiar with the work around the <template> element, or a
>> way to declare document fragments in HTML (see
>> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-November/033868.html
>> for some background).
>>
>> In trying to understand how this newfangled beast would work, I
>> started researching HTML parsing, and--oh boy was I ever sorry! Err..
>> I mean.. --and investigating how the contents of the <template>
>> element could be parsed.
>>
>> So far, I have two ideas. Both introduce changes to HTML parsing
>> algorithm. Both have flaws, and I thought the best thing to do would
>> be to share the data with the experts and seek their opinions. Those
>> of you cc'd -- you're the designated experts :)
>>
>> == IDEA 1: Keep template contents parsing in the tokenizer ==
>>
>> PRO: if we could come up with a way to perceive the stuff between
>> <template> and </template> as a character stream, we enable a set of
>> use cases where the template contents does not need to be a complete
>> HTML subtree. For example, I could define a template that sets up a
>> start of a table, then a few that provide repetition patterns for
>> rows/cells, and then one to close out a table:
>>
>> <template id="head"><table><caption>Nyan-nyan</caption><thead> ...
>> <tbody></template>
>> <template id="row"><tr><template><td> ... </td></template></tr></template>
>> <template id="foot"></tbody></table></template>
>>
>> Then I could slam these templates together with some API and produce
>> an arbitrary set of tables.
>>
>> PRO: Since the template contents are parsed as string, we create
>> opportunities for performance optimizations at the UA level. If a
>> bunch of templates is declared, but only a handful is used, we could
>> parse template contents on demand, thus reducing the churn of DOM
>> elements.
>>
>> CON: Tokenizer needs to be really smart and will start looking a lot
>> like a specialized parser. At first glance, <template> behaves much
>> like a <textarea> -- any tags inside will just be treated as
>> characters. It works until you realize that templates sometimes need
>> to be nested. Any use case that involves building a
>> larger-than-one-dimensional data representation (like tables) will
>> involve nested templates. This makes things rather tricky. I made an
>> attempt of sketching this out here:
>> http://dvcs.w3.org/hg/webcomponents/raw-file/a28e16cc4167/spec/templates/index.html#parsing.
>> As you can see, this adds a largish set of new states to tokenizer.
>> And it is still incomplete, breaking in cases like
>> <template><script>alert('<template> is
>> awesome!');</script></template>.
>>
>> It could be argued that--while pursuing the tokenizer algorithm
>> perfection--we could just stop at some point of complexity and issue a
>> stern warning for developers to not get too crazy, because stuff will
>> break -- akin to including "</script>" string in your Javascript code.
>>
>> == IDEA 2: Just tweak insertion modes ==
>>
>> PRO: It's a lot less intrusive to the parser -- just adjust insertion
>> modes to allow <template> tags in places where they would ordinary be
>> ignored or foster-parented, and add a new insertion for template
>> contents to let all tags in. I made a quick sketch here:
>> http://dvcs.w3.org/hg/webcomponents/raw-file/c96f051ca008/spec/templates/index.html#parsing
>> (Note: more massaging is needed to make it really work)
>>
>> CON: You can't address fun partial-tree scenarios.
>>
>> Which idea appeals to you? Perhaps you have better ideas? Please share.
>>
>> :DG<
Received on Wednesday, 8 February 2012 22:50:46 UTC