Re: Proposal: Document.parse() [AKA: Implied Context Parsing] from Simon Pieters on 2012-05-25 (public-webapps@w3.org from April to June 2012)

From: Simon Pieters <simonp@opera.com>
Date: Fri, 25 May 2012 09:32:28 +0200
To: "Webapps WG" <public-webapps@w3.org>, "Rafael Weinstein" <rafaelw@google.com>
Message-ID: <op.weuskew9idj3kv@simons-macbook-pro.local>

On Fri, 25 May 2012 09:01:43 +0200, Rafael Weinstein <rafaelw@google.com>  
wrote:

> Ok, so from consensus on earlier threads, here's the full API &  
> semantics.
>
> Now's the time to raise objections to UA's adding support for this  
> feature.
>
> -----
>
> 1) The Document interface is extended to include a new method:
>
> DocumentFragment parse (DOMString markup);
>
> which:
> -Invokes the fragment parsing algorithm with markup and an empty
> context element,
> -Unmarks all scripts in the returned fragment node as "already started"
> -Returns the fragment node
>
> 2) The fragment parsing algorithm's context element is now optional.
>
> It's behavior is similar to the case of a known context element, but
> the tokenizer is simply set to the data state
>
> 3) Resetting the insertion appropriately now sets the mode to "Implied
> Context" if parsing a fragment and no context element is set, and
> aborts.
>
> 4) A new "Implied Context" insertion mode is defined which
>
> -Ignores doctype, end tag tokens
> -Handles comment & character tokens as if "in body"
> -Handles the following start tags as if "in body" (which is as if "in
> head"): <style>, <script>, <link>, <meta>
> -Handles any other start tag by selecting a context element, resetting
> the insertion mode appropriately and reprocessing the token.
>
> 5) A new "selecting a context element" algorithm is defined which
> takes a start tag as input and outputs an element. The element's
> identity is as follows:
>
> -If start tag is tbody, thead, tfoot, caption or colgroup
>   return <table>
> -if start tag is tr,
>   return <tbody>
> -if start tag is col
>   return <colgroup>
> -if start tag is td or td
>   return <tr>
> -if start tag is head or body
>   return <html>
> -if start tag is rp or rt
>   return <ruby>

I think <ruby> is better handled by always making <rp> and <rt> generate  
implied end tags in the fragment case (maybe even when parsing normally,  
too). Making the context element <ruby> still doesn't make <rt> parse  
right, because the spec currently looks for ruby on the *stack* (and the  
context element isn't on the stack).

Also, the ruby base is allowed to include markup, so this would fail:

ruby.appendChild(document.parse('<span>foo</span><rt>bar<rt>baz'));


> -if start tag is a defined SVG localName (case insensitive)
>   return <svg>

Except those that conflict with HTML?

> -if start tag is a defined MathML localName (case insensitive)
>   return <math>

(Making the context element svg or math doesn't do anything currently:  
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16635 )

> -otherwise, return <body>


-- 
Simon Pieters
Opera Software

Received on Friday, 25 May 2012 07:32:41 UTC