Re: [webcomponents] HTML Parsing and the <template> element

From: Adam Barth <w3c@adambarth.com>
Date: Thu, 5 Apr 2012 00:27:57 -0700
Message-ID: <CAJE5ia_cs54PQbzsVV7VZk_n49++45BOW3AnmCY3H0SqfHubBA@mail.gmail.com>
To: Rafael Weinstein <rafaelw@google.com>
Cc: Dimitri Glazkov <dglazkov@chromium.org>, Henri Sivonen <hsivonen@iki.fi>, public-webapps <public-webapps@w3.org>, Ian Hickson <ian@hixie.ch>, Erik Arvidsson <arv@google.com>, Yehuda Katz <wycats@gmail.com>
On Wed, Apr 4, 2012 at 12:12 PM, Rafael Weinstein <rafaelw@google.com> wrote:
> On Mon, Apr 2, 2012 at 3:21 PM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
>> On Wed, Feb 8, 2012 at 11:25 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
>>> On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
>>>> == IDEA 1: Keep template contents parsing in the tokenizer ==
>>> Not this!
>>> Here's why:
>>> Making something look like markup but then not tokenizing it as markup
>>> is confusing. The confusion leads to authors not having a clear mental
>>> model of what's going on and where stuff ends. Trying to make things
>>> just work for authors leads to even more confusing "here be dragons"
>>> solutions. Check out
>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state
>>> Making something that looks like markup but isn't tokenized as markup
>>> also makes the delta between HTML and XHTML greater. Some people may
>>> be ready to throw XHTML under the bus completely at this point, but
>>> this also goes back to the confusion point. Apart from namespaces, the
>>> mental model you can teach for XML is remarkably sane. Whenever HTML
>>> deviates from it, it's a complication in the understandability of
>>> HTML.
>>> Also, multi-level parsing is in principle bad for perf. (How bad
>>> really? Dunno.) I *really* don't want to end up writing a single-pass
>>> parser that has to be black-box indishtinguishable from something
>>> that's defined as a multi-pass parser.
>>> (There might be a longer essay about how this sucks in the public-html
>>> archives, since the SVG WG proposed something like this at one point,
>>> too.)
>>>> == IDEA 2: Just tweak insertion modes ==
>>> I think a DWIM insertion mode that switches to another mode and
>>> reprocesses the token upon the first start tag token *without* trying
>>> to return to the DWIM insertion mode when the matching end tag is seen
>>> for the start tag that switched away from the DWIM mode is something
>>> that might be worth pursuing. If we do it, I think we should make it
>>> work for a fragment parsing API that doesn't require context beyound
>>> assuming HTML, too. (I think we shouldn't try to take the DWIM so far
>>> that a contextless API would try to guess HTML vs. SVG vs. MathML.)
>> Just to connect the threads. A few weeks back, I posted an update
>> about the HTML Templates spec:
>> http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1171.html
>> Perhaps lost among other updates was the fact that I've gotten the
>> first draft of HTML Templates spec out:
>> http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html
>> The draft is roughly two parts: motivation for the spec and deltas to
>> HTML specification to allow serialization and parsing of the
>> <template> element. To be honest, after finishing the draft, I
>> wondered if we should just merge the whole thing into the HTML
>> specification.
>> As a warm-up exercise for the draft, I first implemented the changes
>> to tree construction algorithm here in WebKit
>> (https://bugs.webkit.org/show_bug.cgi?id=78734). The patch
>> (https://bugs.webkit.org/attachment.cgi?id=128579&action=review)
>> includes new parsing tests, and should be fairly intuitive to read to
>> those familiar with the test format.
>> The interesting bit here is that all parser changes are additive: we
>> are only adding what effectively are extensions points -- well, that
>> and a new contextless parsing mode for when inside of the <template>
>> tag.
> I think the task previously was to show how dramatic the changes to
> the parser would need to be. Talking to Dimitri, it sounds to me like
> they turned out to be less "open-heart-surgery" and more "quick
> outpatient procedure". Adam, Hixie, Henri, how do you guys feel about
> the invasiveness of the parser changes that Dimitri has turned out
> here?

If you're going to change the parser when adding the <template>
element, what's in that spec looks fairly reasonable to me.  Hixie and
Henri have spent more time designing the algorithm that I have (Eric
and I just implemented it), so they might have a different


>>> The violation of the Degrade Gracefully principle and tearing the
>>> parser spec open right when everybody converged on the spec worry me,
>>> though. I'm still hoping for a design that doesn't require parser
>>> changes at all and that doesn't blow up in legacy browsers (even
>>> better if the results in legacy browsers were sane enough to serve as
>>> input for a polyfill).
>> I agree with your concern. It's bugging me too -- that's why I am not
>> being an arrogant jerk yelling at people and trying to shove this
>> through. In general, it's difficult to justify making changes to
>> anything that's stable -- especially considering how long and painful
>> the road to getting stable was. However, folks like Yehuda, Erik, and
>> Rafael spent years tackling this problem, and I tend to trust their
>> steady hand... hands?
> I don't think there's an option to degrade gracefully here. My
> personal feeling is that even if it's years before browsers reliably
> support this and developers can use it without needing to "be careful"
> until then, there's a long term view here which is the sooner me put
> this into the spec, the sooner that day will arrive.
> Also, I like this approach because it addresses the inert DOM use case
> as well as the context-free parsing use case.
>> :DG<
