Re: Black-box equivalence of parsing fragments directly into context node from Ian Hickson on 2009-03-30 (public-html@w3.org from March 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 30 Mar 2009 23:36:21 +0000 (UTC)
To: Jonas Sicking <jonas@sicking.cc>
Cc: Henri Sivonen <hsivonen@iki.fi>, Boris Zbarsky <bzbarsky@MIT.EDU>, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0903300236220.25082@hixie.dreamhostps.com>
On Wed, 3 Dec 2008, Jonas Sicking wrote:
> Ian Hickson wrote:
> > > Is what I described above not black-box equivalent to the steps that 
> > > the spec prescribes?
> > 
> > I believe it is, though I wouldn't guarantee it.
> > 
> > On Wed, 26 Nov 2008, Jonas Sicking wrote:
> > > Why couldn't the spec instead say to use the ownerDocument of the 
> > > context node (like Henri is suggesting) and parse into a 
> > > documentFragment node? I.e. why do we need the new Document node and 
> > > the new <html> node?
> > 
> > I guess we could do that, but what would it gain us? Implementations 
> > are free to optimise this anyway.
> 
> See your answer to the previous question :)

I don't understand. Why would changing the spec from one possible 
algorithm to another possible algorithm help with people trying to 
implement other possible black-box equivalent variants?


> I.e. while it is possible to come up with something that is performant, 
> ensuring that it is guaranteed to be exactly a black-box equivalent to 
> the spec is hard.

Sure. That's your job. :-)

What is performant for one implementation may not be performant for 
another. It doesn't make sense for the spec to be defined in terms of an 
algorithm that is performant in one architecture, unless that is likely an 
optimum solution, because otherwise implementors are more likely to 
consider the risk of not quite matching the spec as outweighing the 
benefit of trying a different strategy to get more performance.


> It is also quite possible that there are unintended edgecases that would 
> need either "unnecessary" extra code, or cost unindented perf hits just 
> to ensure that it is a black-box equivalent of the spec algorithm.

Feedback from implementors about exactly this kind of thing is why we have 
a feedback process. Tell me if you find there are such edge cases.


> > On Wed, 26 Nov 2008, Henri Sivonen wrote:
> > > I did notice Boris' points about <base> and the form pointer in 
> > > mozilla.dev.platform. However, wouldn't it be feasible to set the 
> > > form pointer to the nearest form parent of the context node and not 
> > > process <base> in the fragment mode? Presumably, XLink autoloads and 
> > > the load event for SVG fragments would have to be suppressed, but 
> > > that's not worse than having to mark scripts as already executed.
> > 
> > I think it would be possible; the question is would the benefit 
> > outweigh the cost. I'm not sure it would, from the spec's point of 
> > view. It's a lot easier to reason about what the spec means if it is 
> > clearly a separate document -- none of your questions come up, for 
> > example.
> 
> How so? XLink autoloads could be interpreted as replacing the separate 
> doc and then that new doc is what is inserted.

Only a very creative reading of the specs would lead to this conclusion, I 
think. The HTML5 spec is pretty precise about which Document object you 
use for the inserting, and I don't think anything in the XLink spec would 
justify reusing the same Document for a new resource.


> And onload events need to be defined if/when they parse anyway.
> For example, if they are defined to be firing while the new DOM is in a 
> separate doc, then we would in fact be forced to parse into a separate 
> doc since that is the DOM that such event handlers would see. I.e. if I 
> have something like
> 
> foo.innerHTML = "<svg onload='alert(document.getElementsByTagName(\'*\')'/>"

The SVG spec is very vague about when these 'load' events are fired, and 
it isn't clear to me that it considers dynamic creation of this kind to be 
"loading" an element, so I think it's fine to be consistent with HTML here 
and not fire any events or run any script during innerHTML.


> > On Wed, 26 Nov 2008, Boris Zbarsky wrote:
> > > From a spec point of view the only obvious issue I see here is that 
> > > the mutation event behavior means the parser needs to take pains to 
> > > produce the same results as would be produced by the 
> > > currently-specified algorithm even in cases when mutation events 
> > > rearrange the DOM.
> > 
> > Surely we don't want any mutation events firing during innerHTML.
> 
> As others have pointed out, there are currently pages that depend on 
> mutation events firing when the new fragment is inserted.

I've updated the spec to defer to the mutation events spec (whatever it 
ends up saying) for all the places that should fire mutation events that 
I'm aware of.


> > > <form id=outer>
> > >   <div id=target></div>
> > > </form>
> > > 
> > > and someone setting
> > > target.innerHTML="<table><tr><td><form id='inner'><input id='c1'>" +
> > >                  "</table><input id='c2'>"
> > > 
> > > Which form should the two <input>s belong to.
> > 
> > The inner one, per spec, I believe.
> 
> That is not what the current spec produces though. When the innerHTML is 
> first parsed, c2 is associated with with the inner form. However when 
> the nodes are then moved out of the temporary document the form owner on 
> c2 is reset to null. When the element is then inserted into the new 
> document the form owner is again reset, this time to the outer form.
>
> This would not be the case if the innerHTML markup is parsed directly 
> into the context node.

This is indeed something I didn't think about when writing the spec. 
However, if innerHTML markup was parsed directly into the context node, 
there would be other problems, e.g. it would cause different mutation 
events to fire than actually do fire.


> For what it's worth, I tried the above example in a few browsers:
> 
> Firefox doesn't create the inner <form> at all. The firefox parser 
> always ignores a <form> tag inside another form, and since we build the 
> whole ancestor stack when setting up the context to parse innerHTML this 
> applies here too. So both <input>s are associated with the outer form.
> 
> IE throws an exception when trying to set innerHTML. It seems to do so 
> any time you set innerHTML on an element that is inside a <form>, and 
> the innerHTML string contains a <form>.
> 
> Opera and Safari both associate c1 with the inner form and c2 with the 
> outer. Possibly due to parsing into a separate document or fragment and 
> then re-associating c2 when moving it from the document/fragment to the 
> main DOM.

If we assume that we don't want the Firefox or IE behaviours, then it 
turns out the spec is already correct. Yay!


> > > I think the document.write()-safe points need to be enumerated. In 
> > > the other cases (which hopefully form an empty set), 
> > > document.write() should be a no-op. That is, I think the spec should 
> > > either specifically make the load event for <svg> a safe point for 
> > > document.write() or it should make document.write() a no-op if 
> > > executed at that point. The fewer these document.write()-safe points 
> > > are, the better.
> > 
> > I don't understand what you mean by "safe point". If you call 
> > document.write() from <svg>, then you'll blow away the document, since 
> > the insertion point won't have been defined.
> 
> Note that this is not how things work in current browsers. Calling 
> document.write from events etc will append to the current document as 
> long as we're not past the point of having parsed the whole network 
> stream.

I've changed this now, as part of the integration of SVG with text/html.


> If we make any and all document.writes that happen outside of a <script> 
> replace the existing document then I would expect pages to break in a 
> very severe way (i.e. the whole page disappears).

This is more or less what happens, yes.


> A much safer strategy would be to make document.writes that happen 
> before we've reached the end of the network stream, but without there 
> being an explicit insertion point, be a no-op.

That's not compatible with legacy UAs, insofar as I can tell.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 30 March 2009 23:37:05 UTC