Re: Black-box equivalence of parsing fragments directly into context node from Ian Hickson on 2008-12-02 (public-html@w3.org from December 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 2 Dec 2008 04:45:19 +0000 (UTC)
To: Henri Sivonen <hsivonen@iki.fi>, Jonas Sicking <jonas@sicking.cc>, Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0812020410530.17414@hixie.dreamhostps.com>
On Mon, 17 Nov 2008, Henri Sivonen wrote:
> 
> I'm considering implementing HTML5 innerHTML setting in Gecko by using 
> the owner document of the context node as the document seen by the 
> parser and by sticking the context node as the first node on the stack 
> (but masking its name to show "html" to the tree builder in order to 
> avoid breaking the fragment algorithm assertions) and by then running 
> the fragment parsing algorithm without returning to the event loop until 
> done. The context node would be in the tree for the entire time. I'd 
> deflect attempts to add more attributes to the root node upon stray 
> <html> tag.
> 
> Is there a reason why the spec doesn't prescribe this? Why does the spec 
> specify parsing into another document first and then moving the nodes 
> over?

The algorithm is intended to be used for purposes other than innerHTML.


> Is what I described above not black-box equivalent to the steps that the 
> spec prescribes?

I believe it is, though I wouldn't guarantee it.


On Wed, 26 Nov 2008, Jonas Sicking wrote:
> 
> Why couldn't the spec instead say to use the ownerDocument of the 
> context node (like Henri is suggesting) and parse into a 
> documentFragment node? I.e. why do we need the new Document node and the 
> new <html> node?

I guess we could do that, but what would it gain us? Implementations are 
free to optimise this anyway.


On Wed, 26 Nov 2008, Henri Sivonen wrote:
> 
> Why is there even a need for parsing into a document fragment? Would 
> mutation events or something of that nature go wrong if parsing directly 
> into the context node?

We'd quench those anyway.


> I did notice Boris' points about <base> and the form pointer in 
> mozilla.dev.platform. However, wouldn't it be feasible to set the form 
> pointer to the nearest form parent of the context node and not process 
> <base> in the fragment mode? Presumably, XLink autoloads and the load 
> event for SVG fragments would have to be suppressed, but that's not 
> worse than having to mark scripts as already executed.

I think it would be possible; the question is would the benefit outweigh 
the cost. I'm not sure it would, from the spec's point of view. It's a lot 
easier to reason about what the spec means if it is clearly a separate 
document -- none of your questions come up, for example.


On Wed, 26 Nov 2008, Boris Zbarsky wrote:
> 
> From a spec point of view the only obvious issue I see here is that the 
> mutation event behavior means the parser needs to take pains to produce 
> the same results as would be produced by the currently-specified 
> algorithm even in cases when mutation events rearrange the DOM.

Surely we don't want any mutation events firing during innerHTML.

The spec currently says:

# DOM mutation events must not fire for changes caused by the UA parsing 
# the document. (Conceptually, the parser is not mutating the DOM, it is 
# constructing it.) This includes the parsing of any content inserted 
# using document.write() and document.writeln() calls.


On Wed, 26 Nov 2008, Jonas Sicking wrote:
> 
> I'm not sure if there are other things that can cause events to fire. 
> For example, does 'change' events fire when parsing a <select> with 
> multiple options selected? If so we'd need to define that such events 
> don't fire until after the innerHTML setting is fully done.

The spec doesn't currently fire 'change' when scripted changes to the 
<select>'s contents occur.


> <form id=outer>
>   <div id=target></div>
> </form>
> 
> and someone setting
> target.innerHTML="<table><tr><td><form id='inner'><input id='c1'>" +
>                  "</table><input id='c2'>"
> 
> Which form should the two <input>s belong to.

The inner one, per spec, I believe.


On Thu, 27 Nov 2008, Henri Sivonen wrote:
> 
> Should document.write() tokenize synchronously when called from a event 
> handler?

Per spec, yes.


> I tested document.write() from a mutation event handler by mutating the 
> tree from script during parse. It looks like Gecko might in some cases 
> tokenize the argument of document.write() from mutation event handler 
> before the mutation that fired the event is complete. I couldn't figure 
> out when exactly that happens. However, when it did, WebKit and Opera 
> behaved differently, so there's no interop.
> 
> As a preliminary opinion, I'd like to suggest that if at all feasible 
> considering legacy, document.write() from an event handler should write 
> to the stream but not tokenize before the method returns. Otherwise, the 
> ways in which the parser needs to be re-entrant become complicated for 
> very little practical gain. With simple testing, this *seems* to be what 
> WebKit and Opera do.
> 
> More generally, if document.write() occurs for any reason other than the 
> parser kicking off the evaluation of a script element, it would be 
> simpler if document.write() returned without tokenizing.

On Thu, 27 Nov 2008, Henri Sivonen wrote:
> 
> Hmm. I didn't mean to take a stance on timeouts here. Only on cases 
> where the control is in the parser and document.write would be a 
> re-entrant call to the parser.

I don't understand why event processing would make this any more complex 
than it already is.


On Thu, 27 Nov 2008, Henri Sivonen wrote:
> 
> For efficient buffering, it's important for the parser to know when it 
> needs to drive buffers into a safe point so that document.write() can 
> insert into the stream. So far, at least my assumption has been that 
> scripts can only execute as a side effect of a parser action when a 
> <script> element (either HTML or SVG) is popped off the stack. Now it 
> has turned out that scripts can execute as a side effect of a parser 
> action also when an <svg> element is popped off the stack.

Ew, that seems highly problematic. Where is this defined?


> I think the document.write()-safe points need to be enumerated. In the 
> other cases (which hopefully form an empty set), document.write() should 
> be a no-op. That is, I think the spec should either specifically make 
> the load event for <svg> a safe point for document.write() or it should 
> make document.write() a no-op if executed at that point. The fewer these 
> document.write()-safe points are, the better.

I don't understand what you mean by "safe point". If you call 
document.write() from <svg>, then you'll blow away the document, since the 
insertion point won't have been defined.


> document.write() from mutation even handler is not completely 
> interoperable. This may not even be a document.write issue but something 
> more general related to mutation events.

Mutation events are a mess; the web apps group is apparently redesigning 
them to work async.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 2 December 2008 04:46:03 UTC