W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2012

Re: [webcomponents] HTML Parsing and the <template> element

From: Dimitri Glazkov <dglazkov@chromium.org>
Date: Mon, 2 Apr 2012 15:21:08 -0700
Message-ID: <CADh5Ky2_C0ZBtQxOE0jmGsr6yAKKV5LMFyXqnmdF=78hDqZ-9w@mail.gmail.com>
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-webapps <public-webapps@w3.org>, Adam Barth <w3c@adambarth.com>, Ian Hickson <ian@hixie.ch>, Rafael Weinstein <rafaelw@google.com>, Erik Arvidsson <arv@google.com>, Yehuda Katz <wycats@gmail.com>
On Wed, Feb 8, 2012 at 11:25 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
>> == IDEA 1: Keep template contents parsing in the tokenizer ==
>
> Not this!
>
> Here's why:
> Making something look like markup but then not tokenizing it as markup
> is confusing. The confusion leads to authors not having a clear mental
> model of what's going on and where stuff ends. Trying to make things
> just work for authors leads to even more confusing "here be dragons"
> solutions. Check out
> http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state
>
> Making something that looks like markup but isn't tokenized as markup
> also makes the delta between HTML and XHTML greater. Some people may
> be ready to throw XHTML under the bus completely at this point, but
> this also goes back to the confusion point. Apart from namespaces, the
> mental model you can teach for XML is remarkably sane. Whenever HTML
> deviates from it, it's a complication in the understandability of
> HTML.
>
> Also, multi-level parsing is in principle bad for perf. (How bad
> really? Dunno.) I *really* don't want to end up writing a single-pass
> parser that has to be black-box indishtinguishable from something
> that's defined as a multi-pass parser.
>
> (There might be a longer essay about how this sucks in the public-html
> archives, since the SVG WG proposed something like this at one point,
> too.)
>
>> == IDEA 2: Just tweak insertion modes ==
>
> I think a DWIM insertion mode that switches to another mode and
> reprocesses the token upon the first start tag token *without* trying
> to return to the DWIM insertion mode when the matching end tag is seen
> for the start tag that switched away from the DWIM mode is something
> that might be worth pursuing. If we do it, I think we should make it
> work for a fragment parsing API that doesn't require context beyound
> assuming HTML, too. (I think we shouldn't try to take the DWIM so far
> that a contextless API would try to guess HTML vs. SVG vs. MathML.)

Just to connect the threads. A few weeks back, I posted an update
about the HTML Templates spec:
http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1171.html

Perhaps lost among other updates was the fact that I've gotten the
first draft of HTML Templates spec out:

http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html

The draft is roughly two parts: motivation for the spec and deltas to
HTML specification to allow serialization and parsing of the
<template> element. To be honest, after finishing the draft, I
wondered if we should just merge the whole thing into the HTML
specification.

As a warm-up exercise for the draft, I first implemented the changes
to tree construction algorithm here in WebKit
(https://bugs.webkit.org/show_bug.cgi?id=78734). The patch
(https://bugs.webkit.org/attachment.cgi?id=128579&action=review)
includes new parsing tests, and should be fairly intuitive to read to
those familiar with the test format.

The interesting bit here is that all parser changes are additive: we
are only adding what effectively are extensions points -- well, that
and a new contextless parsing mode for when inside of the <template>
tag.

> The violation of the Degrade Gracefully principle and tearing the
> parser spec open right when everybody converged on the spec worry me,
> though. I'm still hoping for a design that doesn't require parser
> changes at all and that doesn't blow up in legacy browsers (even
> better if the results in legacy browsers were sane enough to serve as
> input for a polyfill).

I agree with your concern. It's bugging me too -- that's why I am not
being an arrogant jerk yelling at people and trying to shove this
through. In general, it's difficult to justify making changes to
anything that's stable -- especially considering how long and painful
the road to getting stable was. However, folks like Yehuda, Erik, and
Rafael spent years tackling this problem, and I tend to trust their
steady hand... hands?

:DG<
Received on Monday, 2 April 2012 22:21:37 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:51 GMT