Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out from Rafael Weinstein on 2012-05-16 (public-webapps@w3.org from April to June 2012)

From: Rafael Weinstein <rafaelw@google.com>
Date: Wed, 16 May 2012 16:29:28 -0700
To: Yehuda Katz <wycats@gmail.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, Webapps WG <public-webapps@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, Scott González <scott.gonzalez@gmail.com>
Message-ID: <CABMdHiSnBcoHWcJ008f51HRHzEQCV4Ju2M7C3h=GZy=e17Jn_g@mail.gmail.com>

Ok. I think I'm convinced on all points.

I've uploaded a webkit patch which implements what we've agreed on here:

https://bugs.webkit.org/show_bug.cgi?id=84646

I'm happy to report that this patch is nicer than the queued-token
approach. Good call, Henri.

On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz <wycats@gmail.com> wrote:
>
> Yehuda Katz
> (ph) 718.877.1325
>
>
> On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>>
>> On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein <rafaelw@google.com>
>> wrote:
>> > Issue 1: How to handle tokens which precede the first start tag
>> >
>> > Options:
>> > a) Queue them, and then later run them through tree construction once
>> > the implied context element has been picked
>> >
>> > b) Create a new insertion like "waiting for context element", which
>> > probably ignores end tags and doctype and inserts character tokens and
>> > comments. Once the implied context element is picked, reset the
>> > insertion mode appropriately, and procede normally.
>>
>> I prefer b).
>
>
> I like b as well. I assume it means that the "waiting for context element"
> insertion mode would keep scanning until the ambiguity was resolved, and
> then enter the appropriate insertion mode. Am I misunderstanding?

I think what Yehuda is getting at here is that there are a handful of
tags which are allowed to appear anywhere, so it doesn't make sense to
"resolve the ambiguity" based on their identity.

I talked with Tab about this, and happily, that set seems to be
<style>, <script>, <meta>, & <link>. Happily, because this means that
the new "ImpliedContext" insertion mode can handle start tags as
follows (code from the above patch)

if (token.name() == styleTag
    || token.name() == scriptTag
    || token.name() == metaTag
    || token.name() == linkTag) {
    processStartTagForInHead(token); // "process following the rules
for the "in head" insertion mode"
    return;
}

m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
"set the context element"
resetInsertionModeAppropriately(); "reset the insertion mode appropriately"
processStartTag(token); // "reprocess the token"

>
>>
>>
>> I'm assuming the use case for this stuff isn't that authors throw
>> random stuff at the API and then insert the result somewhere. I expect
>> authors to pass string literals or somewhat cooked string literals to
>> the API knowing where they're going to insert the result but not
>> telling the insertion point to the API as a matter of convenience.
>>
>> If you know you are planning to insert stuff as a child of tbody,
>> don't start your string literal with stuff that would tokenize as
>> characters!
>>
>> (Firefox currently does not have the capability to queue tokens.
>> Speculative parsing in Firefox is not based on queuing tokens. See
>> https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
>> details.)
>>
>> > Issue 2: How to infer a non-HTML implied context element
>> >
>> > Options:
>> > a) By tagName alone. When multiple namespaces match, prefer HTML, and
>> > then either SVG or MathML (possibly on a per-tagName basis)
>> >
>> > b) Also inspect attributes for tagNames which may be in multiple
>> > namespaces
>>
>> AFAICT, the case where this really matters (if my assumptions about
>> use cases are right) is <a>. (Fragment parsing makes scripts useless
>> anyway by setting their "already started" flag, authors probably
>> shouldn't be adding styles by parsing <style>, both HTML and SVG
>> <font> are considered harmful and cross-browser support Content MathML
>> is far off in the horizon.)
>>
>> So I prefer a) possibly with <a>-specific elaborations if we can come
>> up with some. Generic solutions seem to involve more complexity. For
>> example, if we supported a generic attribute for forcing SVG
>> interpretation, would it put us on a slippery slope to support it when
>> it appears on tokens that aren't the first start tag token in a
>> contextless fragment parse?
>>
>> > Issue 3: What form does the API take
>> >
>> > a) Document.innerHTML
>> >
>> > b) document.parse()
>> >
>> > c) document.createDocumentFragment()
>>
>> I prefer b) because:
>>  * It doesn't involve creating the fragment as a separate step.
>>  * It doesn't need to be foolishly consistent with the HTML vs. XML
>> design errors of innerHTML.
>>  * It's shorted than document.createDocumentFragment().
>>  * Unlike innerHTML, it is a method, so we can add more arguments
>> later (or right away) to refine its behavior.
>>
>> --
>> Henri Sivonen
>> hsivonen@iki.fi
>> http://hsivonen.iki.fi/
>
>

Received on Wednesday, 16 May 2012 23:29:58 UTC