Re: Black-box equivalence of parsing fragments directly into context node from Jonas Sicking on 2008-12-04 (public-html@w3.org from December 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 03 Dec 2008 17:11:52 -0800
To: Ian Hickson <ian@hixie.ch>
CC: Henri Sivonen <hsivonen@iki.fi>, Boris Zbarsky <bzbarsky@MIT.EDU>, HTML WG <public-html@w3.org>
Message-ID: <49372E58.6010308@sicking.cc>
Ian Hickson wrote:
>> Is what I described above not black-box equivalent to the steps that the 
>> spec prescribes?
> 
> I believe it is, though I wouldn't guarantee it.
> 
> 
> On Wed, 26 Nov 2008, Jonas Sicking wrote:
>> Why couldn't the spec instead say to use the ownerDocument of the 
>> context node (like Henri is suggesting) and parse into a 
>> documentFragment node? I.e. why do we need the new Document node and the 
>> new <html> node?
> 
> I guess we could do that, but what would it gain us? Implementations are 
> free to optimise this anyway.

See your answer to the previous question :)

I.e. while it is possible to come up with something that is performant, 
ensuring that it is guaranteed to be exactly a black-box equivalent to 
the spec is hard.

It is also quite possible that there are unintended edgecases that would 
need either "unnecessary" extra code, or cost unindented perf hits just 
to ensure that it is a black-box equivalent of the spec algorithm.

> On Wed, 26 Nov 2008, Henri Sivonen wrote:
>> Why is there even a need for parsing into a document fragment? Would 
>> mutation events or something of that nature go wrong if parsing directly 
>> into the context node?
> 
> We'd quench those anyway.
> 
>> I did notice Boris' points about <base> and the form pointer in 
>> mozilla.dev.platform. However, wouldn't it be feasible to set the form 
>> pointer to the nearest form parent of the context node and not process 
>> <base> in the fragment mode? Presumably, XLink autoloads and the load 
>> event for SVG fragments would have to be suppressed, but that's not 
>> worse than having to mark scripts as already executed.
> 
> I think it would be possible; the question is would the benefit outweigh 
> the cost. I'm not sure it would, from the spec's point of view. It's a lot 
> easier to reason about what the spec means if it is clearly a separate 
> document -- none of your questions come up, for example.

How so? XLink autoloads could be interpreted as replacing the separate 
doc and then that new doc is what is inserted. And onload events need to 
be defined if/when they parse anyway.

For example, if they are defined to be firing while the new DOM is in a 
separate doc, then we would in fact be forced to parse into a separate 
doc since that is the DOM that such event handlers would see. I.e. if I 
have something like

foo.innerHTML = "<svg onload='alert(document.getElementsByTagName(\'*\')'/>"



> On Wed, 26 Nov 2008, Boris Zbarsky wrote:
>> From a spec point of view the only obvious issue I see here is that the 
>> mutation event behavior means the parser needs to take pains to produce 
>> the same results as would be produced by the currently-specified 
>> algorithm even in cases when mutation events rearrange the DOM.
> 
> Surely we don't want any mutation events firing during innerHTML.

As others have pointed out, there are currently pages that depend on 
mutation events firing when the new fragment is inserted.

Note that when I said that I'm pondering removing support for mutation 
events entirely, that is going to be a page-breaking change. It's 
something that would have to be rolled out over time, and not until we 
have a good replacement for them.

>> <form id=outer>
>>   <div id=target></div>
>> </form>
>>
>> and someone setting
>> target.innerHTML="<table><tr><td><form id='inner'><input id='c1'>" +
>>                  "</table><input id='c2'>"
>>
>> Which form should the two <input>s belong to.
> 
> The inner one, per spec, I believe.

That is not what the current spec produces though. When the innerHTML is 
first parsed, c2 is associated with with the inner form. However when 
the nodes are then moved out of the temporary document the form owner on 
c2 is reset to null. When the element is then inserted into the new 
document the form owner is again reset, this time to the outer form.

This would not be the case if the innerHTML markup is parsed directly 
into the context node.


For what it's worth, I tried the above example in a few browsers:

Firefox doesn't create the inner <form> at all. The firefox parser 
always ignores a <form> tag inside another form, and since we build the 
whole ancestor stack when setting up the context to parse innerHTML this 
applies here too. So both <input>s are associated with the outer form.

IE throws an exception when trying to set innerHTML. It seems to do so 
any time you set innerHTML on an element that is inside a <form>, and 
the innerHTML string contains a <form>.

Opera and Safari both associate c1 with the inner form and c2 with the 
outer. Possibly due to parsing into a separate document or fragment and 
then re-associating c2 when moving it from the document/fragment to the 
main DOM.

>> I think the document.write()-safe points need to be enumerated. In the 
>> other cases (which hopefully form an empty set), document.write() should 
>> be a no-op. That is, I think the spec should either specifically make 
>> the load event for <svg> a safe point for document.write() or it should 
>> make document.write() a no-op if executed at that point. The fewer these 
>> document.write()-safe points are, the better.
> 
> I don't understand what you mean by "safe point". If you call 
> document.write() from <svg>, then you'll blow away the document, since the 
> insertion point won't have been defined.

Note that this is not how things work in current browsers. Calling 
document.write from events etc will append to the current document as 
long as we're not past the point of having parsed the whole network stream.

If we make any and all document.writes that happen outside of a <script> 
replace the existing document then I would expect pages to break in a 
very severe way (i.e. the whole page disappears).

A much safer strategy would be to make document.writes that happen 
before we've reached the end of the network stream, but without there 
being an explicit insertion point, be a no-op.

/ Jonas
Received on Thursday, 4 December 2008 01:12:34 UTC