Re: [webcomponents] HTML Parsing and the <template> element

Re-using the generic raw text element parsing algorithm would be the
simplest change to the parser.  Do you have a concrete example of
where nested <template> declarations are required?  For example,
rather than including nested templates, you might instead consider
referencing other template elements by id.

Adam


On Wed, Feb 8, 2012 at 2:00 PM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
> Hello folks!
>
> You may be familiar with the work around the <template> element, or a
> way to declare document fragments in HTML (see
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-November/033868.html
> for some background).
>
> In trying to understand how this newfangled beast would work, I
> started researching HTML parsing, and--oh boy was I ever sorry! Err..
> I mean.. --and investigating how the contents of the <template>
> element could be parsed.
>
> So far, I have two ideas. Both introduce changes to HTML parsing
> algorithm. Both have flaws, and I thought the best thing to do would
> be to share the data with the experts and seek their opinions. Those
> of you cc'd -- you're the designated experts :)
>
> == IDEA 1: Keep template contents parsing in the tokenizer ==
>
> PRO: if we could come up with a way to perceive the stuff between
> <template> and </template> as a character stream, we enable a set of
> use cases where the template contents does not need to be a complete
> HTML subtree. For example, I could define a template that sets up a
> start of a table, then a few that provide repetition patterns for
> rows/cells, and then one to close out a table:
>
> <template id="head"><table><caption>Nyan-nyan</caption><thead> ...
> <tbody></template>
> <template id="row"><tr><template><td> ... </td></template></tr></template>
> <template id="foot"></tbody></table></template>
>
> Then I could slam these templates together with some API and produce
> an arbitrary set of tables.
>
> PRO: Since the template contents are parsed as string, we create
> opportunities for performance optimizations at the UA level. If a
> bunch of templates is declared, but only a handful is used, we could
> parse template contents on demand, thus reducing the churn of DOM
> elements.
>
> CON: Tokenizer needs to be really smart and will start looking a lot
> like a specialized parser. At first glance, <template> behaves much
> like a <textarea> -- any tags inside will just be treated as
> characters. It works until you realize that templates sometimes need
> to be nested. Any use case that involves building a
> larger-than-one-dimensional data representation (like tables) will
> involve nested templates. This makes things rather tricky. I made an
> attempt of sketching this out here:
> http://dvcs.w3.org/hg/webcomponents/raw-file/a28e16cc4167/spec/templates/index.html#parsing.
> As you can see, this adds a largish set of new states to tokenizer.
> And it is still incomplete, breaking in cases like
> <template><script>alert('<template> is
> awesome!');</script></template>.
>
> It could be argued that--while pursuing the tokenizer algorithm
> perfection--we could just stop at some point of complexity and issue a
> stern warning for developers to not get too crazy, because stuff will
> break -- akin to including "</script>" string in your Javascript code.
>
> == IDEA 2: Just tweak insertion modes ==
>
> PRO: It's a lot less intrusive to the parser -- just adjust insertion
> modes to allow <template> tags in places where they would ordinary be
> ignored or foster-parented, and add a new insertion for template
> contents to let all tags in. I made a quick sketch here:
> http://dvcs.w3.org/hg/webcomponents/raw-file/c96f051ca008/spec/templates/index.html#parsing
> (Note: more massaging is needed to make it really work)
>
> CON: You can't address fun partial-tree scenarios.
>
> Which idea appeals to you? Perhaps you have better ideas? Please share.
>
> :DG<

Received on Wednesday, 8 February 2012 22:11:05 UTC