Re: Black-box equivalence of parsing fragments directly into context node from Ian Hickson on 2009-06-03 (public-html@w3.org from June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 3 Jun 2009 09:53:52 +0000 (UTC)
To: Jonas Sicking <jonas@sicking.cc>
Cc: Henri Sivonen <hsivonen@iki.fi>, Boris Zbarsky <bzbarsky@mit.edu>, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0906030941580.1648@hixie.dreamhostps.com>
On Mon, 6 Apr 2009, Jonas Sicking wrote:
> >
> > What is performant for one implementation may not be performant for 
> > another. It doesn't make sense for the spec to be defined in terms of 
> > an algorithm that is performant in one architecture, unless that is 
> > likely an optimum solution, because otherwise implementors are more 
> > likely to consider the risk of not quite matching the spec as 
> > outweighing the benefit of trying a different strategy to get more 
> > performance.
> 
> So you are writing a intentionally slow algorithm in the spec in order 
> to signal to implementers "you really should optimize this"?

No, but I don't consider performance critical to the "implementation" 
described in the spec, and I prioritise clarity above performance.


> >> And onload events need to be defined if/when they parse anyway.
> >> For example, if they are defined to be firing while the new DOM is in a
> >> separate doc, then we would in fact be forced to parse into a separate
> >> doc since that is the DOM that such event handlers would see. I.e. if I
> >> have something like
> >>
> >> foo.innerHTML = "<svg onload='alert(document.getElementsByTagName(\'*\')'/>"
> >
> > The SVG spec is very vague about when these 'load' events are fired, and
> > it isn't clear to me that it considers dynamic creation of this kind to be
> > "loading" an element, so I think it's fine to be consistent with HTML here
> > and not fire any events or run any script during innerHTML.
> 
> This needs to be clear in the spec if it's not already.

I'm not exactly sure what isn't clear here.


> >> > > <form id=outer>
> >> > >   <div id=target></div>
> >> > > </form>
> >> > >
> >> > > and someone setting
> >> > > target.innerHTML="<table><tr><td><form id='inner'><input id='c1'>" +
> >> > >                  "</table><input id='c2'>"
> >> > >
> >> > > Which form should the two <input>s belong to.
> >> >
> >> > The inner one, per spec, I believe.
> >>
> >> That is not what the current spec produces though. When the innerHTML is
> >> first parsed, c2 is associated with with the inner form. However when
> >> the nodes are then moved out of the temporary document the form owner on
> >> c2 is reset to null. When the element is then inserted into the new
> >> document the form owner is again reset, this time to the outer form.
> >>
> >> This would not be the case if the innerHTML markup is parsed directly
> >> into the context node.
> >
> > This is indeed something I didn't think about when writing the spec. 
> > However, if innerHTML markup was parsed directly into the context 
> > node, there would be other problems, e.g. it would cause different 
> > mutation events to fire than actually do fire.
> 
> I'll gladly change when and which mutation events firefox dispatches 
> during setting of .innerHTML. So if that's the reason why the spec 
> doesn't parse directly into the context node I think we can change that.

I was just describing a danger of just changing the spec from one model to 
another for what appear to be editorial reasons.

The spec parses into another document because originally I didn't do 
fragment parsing, and when I added it, it was easier to just reuse the 
document parsing algorithm wholesale than patch it to do fragments.


> Given that not even you realized what form the two inputs in the above 
> example would be bound to. And given that you are one of the main 
> experts on the HTML5 spec, I think we can fairly safely say that the 
> current algorithm for innerHTML yields some surprising results. And 
> surprising results is something we IMHO should avoid.

I don't disagree in general, but I'm not sure that changing the parsing 
model here would really reduce the surprises, so much as change them.


> >> For what it's worth, I tried the above example in a few browsers:
> >>
> >> Firefox doesn't create the inner <form> at all. The firefox parser 
> >> always ignores a <form> tag inside another form, and since we build 
> >> the whole ancestor stack when setting up the context to parse 
> >> innerHTML this applies here too. So both <input>s are associated with 
> >> the outer form.
> >>
> >> IE throws an exception when trying to set innerHTML. It seems to do 
> >> so any time you set innerHTML on an element that is inside a <form>, 
> >> and the innerHTML string contains a <form>.
> >>
> >> Opera and Safari both associate c1 with the inner form and c2 with 
> >> the outer. Possibly due to parsing into a separate document or 
> >> fragment and then re-associating c2 when moving it from the 
> >> document/fragment to the main DOM.
> >
> > If we assume that we don't want the Firefox or IE behaviours, then it 
> > turns out the spec is already correct. Yay!
> 
> Why do you assume that we don't want Firefoxs behavior?

I didn't say we did, just that _if_ we did, that the spec matched 
implementations.


> And even if we assume that, why does not wanting Firefoxs or IEs 
> behavior yield that we want Operas and Webkits?

Well, when the options are (a) something we don't want, (b) 
interoperability between two browsers, and (c) something that nobody 
implements, (b) seems like the better choice.

It may be that we don't want the Safari/Opera behaviour and _do_ want the 
Firefox behaviour or the IE behaviour. It's not clear to me that there's 
any particular reason to prefer one over another here other than Safari 
and Opera doing the same thing.


> You yourself thought that the current spec would yield a result that is 
> different from all current browsers. A behavior that IMHO would be quite 
> logical.

It not being the same as any browser is a pretty big disadvantage.


> Firefox behavior is also quite logical if you think of setting innerHTML 
> as behaving the same as if the inserted markup had been there when the 
> page was parsed. However I don't really think that that is how most 
> people see innerHTML, so I'm not going to advocate for it. But I also 
> don't think people see it as what the spec currently does.

Indeed; I don't think the HTML parsing algorithm is at all intuitive in 
error cases.


> >> > > I think the document.write()-safe points need to be enumerated. 
> >> > > In the other cases (which hopefully form an empty set), 
> >> > > document.write() should be a no-op. That is, I think the spec 
> >> > > should either specifically make the load event for <svg> a safe 
> >> > > point for document.write() or it should make document.write() a 
> >> > > no-op if executed at that point. The fewer these 
> >> > > document.write()-safe points are, the better.
> >> >
> >> > I don't understand what you mean by "safe point". If you call 
> >> > document.write() from <svg>, then you'll blow away the document, 
> >> > since the insertion point won't have been defined.
> >>
> >> Note that this is not how things work in current browsers. Calling 
> >> document.write from events etc will append to the current document as 
> >> long as we're not past the point of having parsed the whole network 
> >> stream.
> >
> > I've changed this now, as part of the integration of SVG with 
> > text/html.
> 
> Actually, I really liked how the spec did it before. Someone doing 
> document.write from outside a <script> while the page is loading is 
> basically a guaranteed race condition. For example using document.write 
> from an XHR onreadystatechange handler, or a timer, is going to race 
> against the network stream loading the main page.

There are no race conditions with document.write() the way the spec is 
written today.


> >> A much safer strategy would be to make document.writes that happen 
> >> before we've reached the end of the network stream, but without there 
> >> being an explicit insertion point, be a no-op.
> >
> > That's not compatible with legacy UAs, insofar as I can tell.
> 
> I think making document.write outside of <script> while the page is 
> loading be a no-op would be very unlikely to break any pages. As 
> described above, any such writes are virtually guaranteed to be a race 
> condition and would make such content appear on random places in the 
> page.

No, it always causes document.open() to be called and thus blows away the 
page altogether.


> Thus it seems very unlikely that pages would be doing that and so it 
> seems safe to change.
> 
> As an implementer I would definitely be willing to try to make such a 
> change if it simplifies the implementation, which I think would be the 
> case.

The current spec seems relatively simple to implement, no?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 3 June 2009 09:54:27 UTC