Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out from Rafael Weinstein on 2012-05-16 (public-webapps@w3.org from April to June 2012)

From: Rafael Weinstein <rafaelw@google.com>
Date: Wed, 16 May 2012 16:52:33 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Yehuda Katz <wycats@gmail.com>, Henri Sivonen <hsivonen@iki.fi>, Webapps WG <public-webapps@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, Scott González <scott.gonzalez@gmail.com>
Message-ID: <CABMdHiRCtev+GFOcD4Wz32NQkmWiN3DBdMVzDfZU_RqjDAZh1Q@mail.gmail.com>

On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein <rafaelw@google.com> wrote:
>> Ok. I think I'm convinced on all points.
>>
>> I've uploaded a webkit patch which implements what we've agreed on here:
>>
>> https://bugs.webkit.org/show_bug.cgi?id=84646
>>
>> I'm happy to report that this patch is nicer than the queued-token
>> approach. Good call, Henri.
>>
>> On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz <wycats@gmail.com> wrote:
>>>
>>> Yehuda Katz
>>> (ph) 718.877.1325
>>>
>>>
>>> On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>>>>
>>>> On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein <rafaelw@google.com>
>>>> wrote:
>>>> > Issue 1: How to handle tokens which precede the first start tag
>>>> >
>>>> > Options:
>>>> > a) Queue them, and then later run them through tree construction once
>>>> > the implied context element has been picked
>>>> >
>>>> > b) Create a new insertion like "waiting for context element", which
>>>> > probably ignores end tags and doctype and inserts character tokens and
>>>> > comments. Once the implied context element is picked, reset the
>>>> > insertion mode appropriately, and procede normally.
>>>>
>>>> I prefer b).
>>>
>>>
>>> I like b as well. I assume it means that the "waiting for context element"
>>> insertion mode would keep scanning until the ambiguity was resolved, and
>>> then enter the appropriate insertion mode. Am I misunderstanding?
>>
>> I think what Yehuda is getting at here is that there are a handful of
>> tags which are allowed to appear anywhere, so it doesn't make sense to
>> "resolve the ambiguity" based on their identity.
>>
>> I talked with Tab about this, and happily, that set seems to be
>> <style>, <script>, <meta>, & <link>. Happily, because this means that
>> the new "ImpliedContext" insertion mode can handle start tags as
>> follows (code from the above patch)
>>
>> if (token.name() == styleTag
>>    || token.name() == scriptTag
>>    || token.name() == metaTag
>>    || token.name() == linkTag) {
>>    processStartTagForInHead(token); // "process following the rules
>> for the "in head" insertion mode"
>>    return;
>> }
>>
>> m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
>> "set the context element"
>> resetInsertionModeAppropriately(); "reset the insertion mode appropriately"
>> processStartTag(token); // "reprocess the token"
>
> So if I understand things correctly, that would mean that:
>
> document.parse("parsed as text<script>parsed as script
> content</script><tr><td>table content</td></tr>");
>
> would return a fragment like:
> #fragment
>  #text "parsed as text"
>  script
>    #text parsed as script content
>  tr
>    td
>      #text table content
>
> Is this correct? The important part here is that the contents of the
> <script> element is parsed according to the rules which normally apply
> when parsing scripts?
>
> (That of course leaves the terrible situation that <script> parsing is
> vastly different in HTML and SVG, but that's a bad problem that
> already exists)

Yes. Exactly.

>
> / Jonas

Received on Wednesday, 16 May 2012 23:53:02 UTC