Re: Black-box equivalence of parsing fragments directly into context node from Henri Sivonen on 2008-12-02 (public-html@w3.org from December 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 2 Dec 2008 11:52:10 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: Jonas Sicking <jonas@sicking.cc>, Boris Zbarsky <bzbarsky@MIT.EDU>, HTML WG <public-html@w3.org>
Message-Id: <030EB9E1-38DB-4617-B89D-2DCE19637155@iki.fi>
On Dec 2, 2008, at 06:45, Ian Hickson wrote:

> On Mon, 17 Nov 2008, Henri Sivonen wrote:
>> Is there a reason why the spec doesn't prescribe this? Why does the  
>> spec
>> specify parsing into another document first and then moving the nodes
>> over?
>
> The algorithm is intended to be used for purposes other than  
> innerHTML.

What other purposes? For HTML fragments in Atom and RSS?

>> Is what I described above not black-box equivalent to the steps  
>> that the
>> spec prescribes?
>
> I believe it is, though I wouldn't guarantee it.

OK.

> On Wed, 26 Nov 2008, Jonas Sicking wrote:
>>
>> Why couldn't the spec instead say to use the ownerDocument of the
>> context node (like Henri is suggesting) and parse into a
>> documentFragment node? I.e. why do we need the new Document node  
>> and the
>> new <html> node?
>
> I guess we could do that, but what would it gain us? Implementations  
> are
> free to optimise this anyway.

In general, when the spec says something that differs significantly  
from what one might implement, there's overhead in trying to figure  
out if the spec says a different thing for a specific reason that  
would foil the optimization.

> On Wed, 26 Nov 2008, Henri Sivonen wrote:
>>
>> Why is there even a need for parsing into a document fragment? Would
>> mutation events or something of that nature go wrong if parsing  
>> directly
>> into the context node?
>
> We'd quench those anyway.

The spec is clear that the act of parsing the fragment doesn't cause  
mutation events to fire. However, the spec isn't clear on whether  
mutation events should fire for the act of inserting the fragment into  
the main document.

> On Wed, 26 Nov 2008, Boris Zbarsky wrote:
>>
>> From a spec point of view the only obvious issue I see here is that  
>> the
>> mutation event behavior means the parser needs to take pains to  
>> produce
>> the same results as would be produced by the currently-specified
>> algorithm even in cases when mutation events rearrange the DOM.
>
> Surely we don't want any mutation events firing during innerHTML.
>
> The spec currently says:
>
> # DOM mutation events must not fire for changes caused by the UA  
> parsing
> # the document. (Conceptually, the parser is not mutating the DOM,  
> it is
> # constructing it.) This includes the parsing of any content inserted
> # using document.write() and document.writeln() calls.

But the act of inserting the fragment into the document is  
conceptually a scripted mutation, isn't it?

> On Thu, 27 Nov 2008, Henri Sivonen wrote:
>> I tested document.write() from a mutation event handler by mutating  
>> the
>> tree from script during parse.
[...]
>> Hmm. I didn't mean to take a stance on timeouts here. Only on cases
>> where the control is in the parser and document.write would be a
>> re-entrant call to the parser.
>
> I don't understand why event processing would make this any more  
> complex
> than it already is.

Yeah, as I pointed out in my third message, my thinking was bogus when  
I wrote the part you quoted.

> On Thu, 27 Nov 2008, Henri Sivonen wrote:
>>
>> For efficient buffering, it's important for the parser to know when  
>> it
>> needs to drive buffers into a safe point so that document.write() can
>> insert into the stream. So far, at least my assumption has been that
>> scripts can only execute as a side effect of a parser action when a
>> <script> element (either HTML or SVG) is popped off the stack. Now it
>> has turned out that scripts can execute as a side effect of a parser
>> action also when an <svg> element is popped off the stack.
>
> Ew, that seems highly problematic. Where is this defined?

http://www.w3.org/TR/SVG/interact.html#LoadEvent
Now I notice that it's even worse than <svg onload>: It's onload on  
any element in the SVG namespace.

The relevant Gecko code for the XML side is
http://mxr.mozilla.org/mozilla-central/source/content/xml/document/src/nsXMLContentSink.cpp#1179

>> I think the document.write()-safe points need to be enumerated. In  
>> the
>> other cases (which hopefully form an empty set), document.write()  
>> should
>> be a no-op. That is, I think the spec should either specifically make
>> the load event for <svg> a safe point for document.write() or it  
>> should
>> make document.write() a no-op if executed at that point. The fewer  
>> these
>> document.write()-safe points are, the better.
>
> I don't understand what you mean by "safe point".

I meant a point where there's at least one parser method on the call  
stack but re-entering the parser with document.write() indeed writes  
(either by immediate tokenization or by inserting to the stream and  
returning without tokenizing immediately).

> If you call
> document.write() from <svg>, then you'll blow away the document,  
> since the
> insertion point won't have been defined.

That's not a safe point, then. :-)

(On its face, blowing away the document seems more complex than making  
document.write a no-op if called from an SVG onload handler. I need to  
study existing cases of blowing away the document more carefully.)

SVG <script> elements should behave exactly like HTML <script>  
elements as far as the insertion point goes, though, right?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 2 December 2008 09:52:52 UTC