Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out from Yehuda Katz on 2012-05-16 (public-webapps@w3.org from April to June 2012)

From: Yehuda Katz <wycats@gmail.com>
Date: Wed, 16 May 2012 00:39:25 -0400
To: Henri Sivonen <hsivonen@iki.fi>
Cc: Rafael Weinstein <rafaelw@google.com>, Webapps WG <public-webapps@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, Scott González <scott.gonzalez@gmail.com>
Message-ID: <CAMFeDTU6H8q_vET5CwQqOuThafGOV1pRFB30G3f0S6eCcynWSQ@mail.gmail.com>

Yehuda Katz
(ph) 718.877.1325


On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen <hsivonen@iki.fi> wrote:

> On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein <rafaelw@google.com>
> wrote:
> > Issue 1: How to handle tokens which precede the first start tag
> >
> > Options:
> > a) Queue them, and then later run them through tree construction once
> > the implied context element has been picked
> >
> > b) Create a new insertion like "waiting for context element", which
> > probably ignores end tags and doctype and inserts character tokens and
> > comments. Once the implied context element is picked, reset the
> > insertion mode appropriately, and procede normally.
>
> I prefer b).
>

I like b as well. I assume it means that the "waiting for context element"
insertion mode would keep scanning until the ambiguity was resolved, and
then enter the appropriate insertion mode. Am I misunderstanding?


>
> I'm assuming the use case for this stuff isn't that authors throw
> random stuff at the API and then insert the result somewhere. I expect
> authors to pass string literals or somewhat cooked string literals to
> the API knowing where they're going to insert the result but not
> telling the insertion point to the API as a matter of convenience.
>
> If you know you are planning to insert stuff as a child of tbody,
> don't start your string literal with stuff that would tokenize as
> characters!
>
> (Firefox currently does not have the capability to queue tokens.
> Speculative parsing in Firefox is not based on queuing tokens. See
> https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
> details.)
>
> > Issue 2: How to infer a non-HTML implied context element
> >
> > Options:
> > a) By tagName alone. When multiple namespaces match, prefer HTML, and
> > then either SVG or MathML (possibly on a per-tagName basis)
> >
> > b) Also inspect attributes for tagNames which may be in multiple
> namespaces
>
> AFAICT, the case where this really matters (if my assumptions about
> use cases are right) is <a>. (Fragment parsing makes scripts useless
> anyway by setting their "already started" flag, authors probably
> shouldn't be adding styles by parsing <style>, both HTML and SVG
> <font> are considered harmful and cross-browser support Content MathML
> is far off in the horizon.)
>
> So I prefer a) possibly with <a>-specific elaborations if we can come
> up with some. Generic solutions seem to involve more complexity. For
> example, if we supported a generic attribute for forcing SVG
> interpretation, would it put us on a slippery slope to support it when
> it appears on tokens that aren't the first start tag token in a
> contextless fragment parse?
>
> > Issue 3: What form does the API take
> >
> > a) Document.innerHTML
> >
> > b) document.parse()
> >
> > c) document.createDocumentFragment()
>
> I prefer b) because:
>  * It doesn't involve creating the fragment as a separate step.
>  * It doesn't need to be foolishly consistent with the HTML vs. XML
> design errors of innerHTML.
>  * It's shorted than document.createDocumentFragment().
>  * Unlike innerHTML, it is a method, so we can add more arguments
> later (or right away) to refine its behavior.
>
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>

Received on Wednesday, 16 May 2012 04:40:15 UTC