Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out from Jonas Sicking on 2012-05-16 (public-webapps@w3.org from April to June 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 16 May 2012 16:49:36 -0700
To: Rafael Weinstein <rafaelw@google.com>
Cc: Yehuda Katz <wycats@gmail.com>, Henri Sivonen <hsivonen@iki.fi>, Webapps WG <public-webapps@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, Scott González <scott.gonzalez@gmail.com>
Message-ID: <CA+c2ei_MKPMWpmWynF4yC8n6SSYspMufetOM_+vntgC8iqe6hQ@mail.gmail.com>

On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein <rafaelw@google.com> wrote:
> Ok. I think I'm convinced on all points.
>
> I've uploaded a webkit patch which implements what we've agreed on here:
>
> https://bugs.webkit.org/show_bug.cgi?id=84646
>
> I'm happy to report that this patch is nicer than the queued-token
> approach. Good call, Henri.
>
> On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz <wycats@gmail.com> wrote:
>>
>> Yehuda Katz
>> (ph) 718.877.1325
>>
>>
>> On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>>>
>>> On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein <rafaelw@google.com>
>>> wrote:
>>> > Issue 1: How to handle tokens which precede the first start tag
>>> >
>>> > Options:
>>> > a) Queue them, and then later run them through tree construction once
>>> > the implied context element has been picked
>>> >
>>> > b) Create a new insertion like "waiting for context element", which
>>> > probably ignores end tags and doctype and inserts character tokens and
>>> > comments. Once the implied context element is picked, reset the
>>> > insertion mode appropriately, and procede normally.
>>>
>>> I prefer b).
>>
>>
>> I like b as well. I assume it means that the "waiting for context element"
>> insertion mode would keep scanning until the ambiguity was resolved, and
>> then enter the appropriate insertion mode. Am I misunderstanding?
>
> I think what Yehuda is getting at here is that there are a handful of
> tags which are allowed to appear anywhere, so it doesn't make sense to
> "resolve the ambiguity" based on their identity.
>
> I talked with Tab about this, and happily, that set seems to be
> <style>, <script>, <meta>, & <link>. Happily, because this means that
> the new "ImpliedContext" insertion mode can handle start tags as
> follows (code from the above patch)
>
> if (token.name() == styleTag
>    || token.name() == scriptTag
>    || token.name() == metaTag
>    || token.name() == linkTag) {
>    processStartTagForInHead(token); // "process following the rules
> for the "in head" insertion mode"
>    return;
> }
>
> m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
> "set the context element"
> resetInsertionModeAppropriately(); "reset the insertion mode appropriately"
> processStartTag(token); // "reprocess the token"

So if I understand things correctly, that would mean that:

document.parse("parsed as text<script>parsed as script
content</script><tr><td>table content</td></tr>");

would return a fragment like:
#fragment
  #text "parsed as text"
  script
    #text parsed as script content
  tr
    td
      #text table content

Is this correct? The important part here is that the contents of the
<script> element is parsed according to the rules which normally apply
when parsing scripts?

(That of course leaves the terrible situation that <script> parsing is
vastly different in HTML and SVG, but that's a bad problem that
already exists)

/ Jonas

Received on Wednesday, 16 May 2012 23:50:37 UTC