W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2012

Re: [webcomponents] HTML Parsing and the <template> element

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 9 Feb 2012 09:25:06 +0200
Message-ID: <CAJQvAueoVgv3cJcG2=K1=fQx+rVi6L-hHw36=vX7LeP3NCDYUw@mail.gmail.com>
To: Dimitri Glazkov <dglazkov@chromium.org>
Cc: public-webapps <public-webapps@w3.org>, Adam Barth <w3c@adambarth.com>, Ian Hickson <ian@hixie.ch>, Rafael Weinstein <rafaelw@google.com>
On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
> == IDEA 1: Keep template contents parsing in the tokenizer ==

Not this!

Here's why:
Making something look like markup but then not tokenizing it as markup
is confusing. The confusion leads to authors not having a clear mental
model of what's going on and where stuff ends. Trying to make things
just work for authors leads to even more confusing "here be dragons"
solutions. Check out
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state

Making something that looks like markup but isn't tokenized as markup
also makes the delta between HTML and XHTML greater. Some people may
be ready to throw XHTML under the bus completely at this point, but
this also goes back to the confusion point. Apart from namespaces, the
mental model you can teach for XML is remarkably sane. Whenever HTML
deviates from it, it's a complication in the understandability of
HTML.

Also, multi-level parsing is in principle bad for perf. (How bad
really? Dunno.) I *really* don't want to end up writing a single-pass
parser that has to be black-box indishtinguishable from something
that's defined as a multi-pass parser.

(There might be a longer essay about how this sucks in the public-html
archives, since the SVG WG proposed something like this at one point,
too.)

> == IDEA 2: Just tweak insertion modes ==

I think a DWIM insertion mode that switches to another mode and
reprocesses the token upon the first start tag token *without* trying
to return to the DWIM insertion mode when the matching end tag is seen
for the start tag that switched away from the DWIM mode is something
that might be worth pursuing. If we do it, I think we should make it
work for a fragment parsing API that doesn't require context beyound
assuming HTML, too. (I think we shouldn't try to take the DWIM so far
that a contextless API would try to guess HTML vs. SVG vs. MathML.)

The violation of the Degrade Gracefully principle and tearing the
parser spec open right when everybody converged on the spec worry me,
though. I'm still hoping for a design that doesn't require parser
changes at all and that doesn't blow up in legacy browsers (even
better if the results in legacy browsers were sane enough to serve as
input for a polyfill).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 9 February 2012 07:25:37 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:50 GMT