Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out from Henri Sivonen on 2012-05-15 (public-webapps@w3.org from April to June 2012)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 15 May 2012 13:46:13 +0300
To: Rafael Weinstein <rafaelw@google.com>
Cc: Webapps WG <public-webapps@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, Yehuda Katz <wycats@gmail.com>, Scott González <scott.gonzalez@gmail.com>
Message-ID: <CAJQvAuc8AmafJnEMQC+vFKTv7OwAYe3aPm3b+iJvXyASF4ZHHw@mail.gmail.com>

On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein <rafaelw@google.com> wrote:
> Issue 1: How to handle tokens which precede the first start tag
>
> Options:
> a) Queue them, and then later run them through tree construction once
> the implied context element has been picked
>
> b) Create a new insertion like "waiting for context element", which
> probably ignores end tags and doctype and inserts character tokens and
> comments. Once the implied context element is picked, reset the
> insertion mode appropriately, and procede normally.

I prefer b).

I'm assuming the use case for this stuff isn't that authors throw
random stuff at the API and then insert the result somewhere. I expect
authors to pass string literals or somewhat cooked string literals to
the API knowing where they're going to insert the result but not
telling the insertion point to the API as a matter of convenience.

If you know you are planning to insert stuff as a child of tbody,
don't start your string literal with stuff that would tokenize as
characters!

(Firefox currently does not have the capability to queue tokens.
Speculative parsing in Firefox is not based on queuing tokens. See
https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
details.)

> Issue 2: How to infer a non-HTML implied context element
>
> Options:
> a) By tagName alone. When multiple namespaces match, prefer HTML, and
> then either SVG or MathML (possibly on a per-tagName basis)
>
> b) Also inspect attributes for tagNames which may be in multiple namespaces

AFAICT, the case where this really matters (if my assumptions about
use cases are right) is <a>. (Fragment parsing makes scripts useless
anyway by setting their "already started" flag, authors probably
shouldn't be adding styles by parsing <style>, both HTML and SVG
<font> are considered harmful and cross-browser support Content MathML
is far off in the horizon.)

So I prefer a) possibly with <a>-specific elaborations if we can come
up with some. Generic solutions seem to involve more complexity. For
example, if we supported a generic attribute for forcing SVG
interpretation, would it put us on a slippery slope to support it when
it appears on tokens that aren't the first start tag token in a
contextless fragment parse?

> Issue 3: What form does the API take
>
> a) Document.innerHTML
>
> b) document.parse()
>
> c) document.createDocumentFragment()

I prefer b) because:
 * It doesn't involve creating the fragment as a separate step.
 * It doesn't need to be foolishly consistent with the HTML vs. XML
design errors of innerHTML.
 * It's shorted than document.createDocumentFragment().
 * Unlike innerHTML, it is a method, so we can add more arguments
later (or right away) to refine its behavior.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 15 May 2012 10:46:49 UTC