Re: [webcomponents] HTML Parsing and the <template> element from Rafael Weinstein on 2012-04-04 (public-webapps@w3.org from April to June 2012)

From: Rafael Weinstein <rafaelw@google.com>
Date: Wed, 4 Apr 2012 12:12:23 -0700
To: Dimitri Glazkov <dglazkov@chromium.org>
Cc: Henri Sivonen <hsivonen@iki.fi>, public-webapps <public-webapps@w3.org>, Adam Barth <w3c@adambarth.com>, Ian Hickson <ian@hixie.ch>, Erik Arvidsson <arv@google.com>, Yehuda Katz <wycats@gmail.com>
Message-ID: <CABMdHiS=x5YCC9RA=hB3P5xma+E27HP0U+_g7xrNKRrvgyOotw@mail.gmail.com>

On Mon, Apr 2, 2012 at 3:21 PM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
> On Wed, Feb 8, 2012 at 11:25 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
>> On Thu, Feb 9, 2012 at 12:00 AM, Dimitri Glazkov <dglazkov@chromium.org> wrote:
>>> == IDEA 1: Keep template contents parsing in the tokenizer ==
>>
>> Not this!
>>
>> Here's why:
>> Making something look like markup but then not tokenizing it as markup
>> is confusing. The confusion leads to authors not having a clear mental
>> model of what's going on and where stuff ends. Trying to make things
>> just work for authors leads to even more confusing "here be dragons"
>> solutions. Check out
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-double-escaped-dash-dash-state
>>
>> Making something that looks like markup but isn't tokenized as markup
>> also makes the delta between HTML and XHTML greater. Some people may
>> be ready to throw XHTML under the bus completely at this point, but
>> this also goes back to the confusion point. Apart from namespaces, the
>> mental model you can teach for XML is remarkably sane. Whenever HTML
>> deviates from it, it's a complication in the understandability of
>> HTML.
>>
>> Also, multi-level parsing is in principle bad for perf. (How bad
>> really? Dunno.) I *really* don't want to end up writing a single-pass
>> parser that has to be black-box indishtinguishable from something
>> that's defined as a multi-pass parser.
>>
>> (There might be a longer essay about how this sucks in the public-html
>> archives, since the SVG WG proposed something like this at one point,
>> too.)
>>
>>> == IDEA 2: Just tweak insertion modes ==
>>
>> I think a DWIM insertion mode that switches to another mode and
>> reprocesses the token upon the first start tag token *without* trying
>> to return to the DWIM insertion mode when the matching end tag is seen
>> for the start tag that switched away from the DWIM mode is something
>> that might be worth pursuing. If we do it, I think we should make it
>> work for a fragment parsing API that doesn't require context beyound
>> assuming HTML, too. (I think we shouldn't try to take the DWIM so far
>> that a contextless API would try to guess HTML vs. SVG vs. MathML.)
>
> Just to connect the threads. A few weeks back, I posted an update
> about the HTML Templates spec:
> http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1171.html
>
> Perhaps lost among other updates was the fact that I've gotten the
> first draft of HTML Templates spec out:
>
> http://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/templates/index.html
>
> The draft is roughly two parts: motivation for the spec and deltas to
> HTML specification to allow serialization and parsing of the
> <template> element. To be honest, after finishing the draft, I
> wondered if we should just merge the whole thing into the HTML
> specification.
>
> As a warm-up exercise for the draft, I first implemented the changes
> to tree construction algorithm here in WebKit
> (https://bugs.webkit.org/show_bug.cgi?id=78734). The patch
> (https://bugs.webkit.org/attachment.cgi?id=128579&action=review)
> includes new parsing tests, and should be fairly intuitive to read to
> those familiar with the test format.
>
> The interesting bit here is that all parser changes are additive: we
> are only adding what effectively are extensions points -- well, that
> and a new contextless parsing mode for when inside of the <template>
> tag.

I think the task previously was to show how dramatic the changes to
the parser would need to be. Talking to Dimitri, it sounds to me like
they turned out to be less "open-heart-surgery" and more "quick
outpatient procedure". Adam, Hixie, Henri, how do you guys feel about
the invasiveness of the parser changes that Dimitri has turned out
here?

>
>> The violation of the Degrade Gracefully principle and tearing the
>> parser spec open right when everybody converged on the spec worry me,
>> though. I'm still hoping for a design that doesn't require parser
>> changes at all and that doesn't blow up in legacy browsers (even
>> better if the results in legacy browsers were sane enough to serve as
>> input for a polyfill).
>
> I agree with your concern. It's bugging me too -- that's why I am not
> being an arrogant jerk yelling at people and trying to shove this
> through. In general, it's difficult to justify making changes to
> anything that's stable -- especially considering how long and painful
> the road to getting stable was. However, folks like Yehuda, Erik, and
> Rafael spent years tackling this problem, and I tend to trust their
> steady hand... hands?

I don't think there's an option to degrade gracefully here. My
personal feeling is that even if it's years before browsers reliably
support this and developers can use it without needing to "be careful"
until then, there's a long term view here which is the sooner me put
this into the spec, the sooner that day will arrive.

Also, I like this approach because it addresses the inert DOM use case
as well as the context-free parsing use case.

>
> :DG<

Received on Wednesday, 4 April 2012 19:12:52 UTC