Re: [webcomponents] HTML Parsing and the <template> element

On Wed, Feb 8, 2012 at 2:41 PM, Ryosuke Niwa <rniwa@webkit.org> wrote:
> On Wed, Feb 8, 2012 at 2:00 PM, Dimitri Glazkov <dglazkov@chromium.org>
> wrote:
>>
>> == IDEA 1: Keep template contents parsing in the tokenizer ==
>>
>> PRO: if we could come up with a way to perceive the stuff between
>> <template> and </template> as a character stream, we enable a set of
>> use cases where the template contents does not need to be a complete
>> HTML subtree. For example, I could define a template that sets up a
>> start of a table, then a few that provide repetition patterns for
>> rows/cells, and then one to close out a table:
>>
>> <template id="head"><table><caption>Nyan-nyan</caption><thead> ...
>> <tbody></template>
>> <template id="row"><tr><template><td> ... </td></template></tr></template>
>> <template id="foot"></tbody></table></template>
>>
>> Then I could slam these templates together with some API and produce
>> an arbitrary set of tables.
>
>
> But that could be done in the second approach as well, right? All you need
> to do is replace "..." by <span class="placeholder"></span> and you can
> replace that element later by some API.

I am not sure I understand what you're saying here, so I'll try to
clarify the example. The first and last templates contain incomplete
tag structures (or partial trees as I call them later--not sure what
the term is): the first only contains the opening <table> and <tbody>
tags, and the last one closes them. Unless you treat template contents
as a string, you can't create a corresponding DOM tree for just the
first or just the last template.

>
>>
>> CON: Tokenizer needs to be really smart and will start looking a lot
>> like a specialized parser. At first glance, <template> behaves much
>> like a <textarea> -- any tags inside will just be treated as
>> characters. It works until you realize that templates sometimes need
>> to be nested. Any use case that involves building a
>> larger-than-one-dimensional data representation (like tables) will
>> involve nested templates.
>
>
> I think we should first discuss and agree on whether we want nested template
> elements or not, and how it should behave.

Ok, sounds good. Rafael and Erik have done a lot of research in this
area. They are more than qualified to answer this question.

>
>> It could be argued that--while pursuing the tokenizer algorithm
>> perfection--we could just stop at some point of complexity and issue a
>> stern warning for developers to not get too crazy, because stuff will
>> break -- akin to including "</script>" string in your Javascript code.
>
>
> I don't think we want to introduce a new variant of </script>. It's way too
> complicated as is.
>
>> PRO: It's a lot less intrusive to the parser -- just adjust insertion
>> modes to allow <template> tags in places where they would ordinary be
>> ignored or foster-parented, and add a new insertion for template
>> contents to let all tags in. I made a quick sketch here:
>>
>> http://dvcs.w3.org/hg/webcomponents/raw-file/c96f051ca008/spec/templates/index.html#parsing
>> (Note: more massaging is needed to make it really work)
>>
>> CON: You can't address fun partial-tree scenarios.
>
>
> Could you elaborate on this point? This approach seems much more manageable
> to implement and will have much less surprising behaviors.

That was referring to the stuff I just explained above.

>
> - Ryosuke
>

Received on Wednesday, 8 February 2012 22:54:36 UTC