- From: Jonas Sicking <jonas@sicking.cc>
- Date: Mon, 6 Apr 2009 00:39:21 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: Henri Sivonen <hsivonen@iki.fi>, Boris Zbarsky <bzbarsky@mit.edu>, HTML WG <public-html@w3.org>
On Mon, Mar 30, 2009 at 4:36 PM, Ian Hickson <ian@hixie.ch> wrote: > On Wed, 3 Dec 2008, Jonas Sicking wrote: >> Ian Hickson wrote: >> > > Is what I described above not black-box equivalent to the steps that >> > > the spec prescribes? >> > >> > I believe it is, though I wouldn't guarantee it. >> > >> > On Wed, 26 Nov 2008, Jonas Sicking wrote: >> > > Why couldn't the spec instead say to use the ownerDocument of the >> > > context node (like Henri is suggesting) and parse into a >> > > documentFragment node? I.e. why do we need the new Document node and >> > > the new <html> node? >> > >> > I guess we could do that, but what would it gain us? Implementations >> > are free to optimise this anyway. >> >> See your answer to the previous question :) > > I don't understand. Why would changing the spec from one possible > algorithm to another possible algorithm help with people trying to > implement other possible black-box equivalent variants? It's much harder to implement an algorithm that is vastly different from the one in the spec, than it is to implement one that is only slightly different. >> I.e. while it is possible to come up with something that is performant, >> ensuring that it is guaranteed to be exactly a black-box equivalent to >> the spec is hard. > > Sure. That's your job. :-) > > What is performant for one implementation may not be performant for > another. It doesn't make sense for the spec to be defined in terms of an > algorithm that is performant in one architecture, unless that is likely an > optimum solution, because otherwise implementors are more likely to > consider the risk of not quite matching the spec as outweighing the > benefit of trying a different strategy to get more performance. So you are writing a intentionally slow algorithm in the spec in order to signal to implementers "you really should optimize this"? If having implementers optimize something is your goal then I think your approach is entirely wrong. First of all I think implementers are going to be much better than any spec at determining what is worth optimizing and what is not. For the simple reason that this changes over time. Which operation is critical to make fast today might be totally different from what is critical to make fast tomorrow. For example all the recent work on JITting JS engines has completely, and will continue to, change the rules for what is important to make fast. (it'll also change the rules for how we should design APIs, but that's a different subject altogether). If your goal is to have a fast implementation then I think writing an algorithm is the wrong approach entirely. It would be more optimizable if you instead wrote which constraints the result should have. That is generally easier to verify a highly optimized design against. But of course the downside is that it makes it harder to write the spec in the detail that we want so I'm not necessarily advocating this. Implementers are going to optimize whatever seems important to optimize. They will only be helped in this the closer the spec is to what for a given implementation is the optimal method since there are fewer differences between the implementation and the spec to verify that they are equivalent. This I'm very much writing with my implementer hat on. >> And onload events need to be defined if/when they parse anyway. >> For example, if they are defined to be firing while the new DOM is in a >> separate doc, then we would in fact be forced to parse into a separate >> doc since that is the DOM that such event handlers would see. I.e. if I >> have something like >> >> foo.innerHTML = "<svg onload='alert(document.getElementsByTagName(\'*\')'/>" > > The SVG spec is very vague about when these 'load' events are fired, and > it isn't clear to me that it considers dynamic creation of this kind to be > "loading" an element, so I think it's fine to be consistent with HTML here > and not fire any events or run any script during innerHTML. This needs to be clear in the spec if it's not already. >> > > <form id=outer> >> > > <div id=target></div> >> > > </form> >> > > >> > > and someone setting >> > > target.innerHTML="<table><tr><td><form id='inner'><input id='c1'>" + >> > > "</table><input id='c2'>" >> > > >> > > Which form should the two <input>s belong to. >> > >> > The inner one, per spec, I believe. >> >> That is not what the current spec produces though. When the innerHTML is >> first parsed, c2 is associated with with the inner form. However when >> the nodes are then moved out of the temporary document the form owner on >> c2 is reset to null. When the element is then inserted into the new >> document the form owner is again reset, this time to the outer form. >> >> This would not be the case if the innerHTML markup is parsed directly >> into the context node. > > This is indeed something I didn't think about when writing the spec. > However, if innerHTML markup was parsed directly into the context node, > there would be other problems, e.g. it would cause different mutation > events to fire than actually do fire. I'll gladly change when and which mutation events firefox dispatches during setting of .innerHTML. So if that's the reason why the spec doesn't parse directly into the context node I think we can change that. Given that not even you realized what form the two inputs in the above example would be bound to. And given that you are one of the main experts on the HTML5 spec, I think we can fairly safely say that the current algorithm for innerHTML yields some surprising results. And surprising results is something we IMHO should avoid. >> For what it's worth, I tried the above example in a few browsers: >> >> Firefox doesn't create the inner <form> at all. The firefox parser >> always ignores a <form> tag inside another form, and since we build the >> whole ancestor stack when setting up the context to parse innerHTML this >> applies here too. So both <input>s are associated with the outer form. >> >> IE throws an exception when trying to set innerHTML. It seems to do so >> any time you set innerHTML on an element that is inside a <form>, and >> the innerHTML string contains a <form>. >> >> Opera and Safari both associate c1 with the inner form and c2 with the >> outer. Possibly due to parsing into a separate document or fragment and >> then re-associating c2 when moving it from the document/fragment to the >> main DOM. > > If we assume that we don't want the Firefox or IE behaviours, then it > turns out the spec is already correct. Yay! Why do you assume that we don't want Firefoxs behavior? And even if we assume that, why does not wanting Firefoxs or IEs behavior yield that we want Operas and Webkits? You yourself thought that the current spec would yield a result that is different from all current browsers. A behavior that IMHO would be quite logical. Firefox behavior is also quite logical if you think of setting innerHTML as behaving the same as if the inserted markup had been there when the page was parsed. However I don't really think that that is how most people see innerHTML, so I'm not going to advocate for it. But I also don't think people see it as what the spec currently does. >> > > I think the document.write()-safe points need to be enumerated. In >> > > the other cases (which hopefully form an empty set), >> > > document.write() should be a no-op. That is, I think the spec should >> > > either specifically make the load event for <svg> a safe point for >> > > document.write() or it should make document.write() a no-op if >> > > executed at that point. The fewer these document.write()-safe points >> > > are, the better. >> > >> > I don't understand what you mean by "safe point". If you call >> > document.write() from <svg>, then you'll blow away the document, since >> > the insertion point won't have been defined. >> >> Note that this is not how things work in current browsers. Calling >> document.write from events etc will append to the current document as >> long as we're not past the point of having parsed the whole network >> stream. > > I've changed this now, as part of the integration of SVG with text/html. Actually, I really liked how the spec did it before. Someone doing document.write from outside a <script> while the page is loading is basically a guaranteed race condition. For example using document.write from an XHR onreadystatechange handler, or a timer, is going to race against the network stream loading the main page. >> A much safer strategy would be to make document.writes that happen >> before we've reached the end of the network stream, but without there >> being an explicit insertion point, be a no-op. > > That's not compatible with legacy UAs, insofar as I can tell. I think making document.write outside of <script> while the page is loading be a no-op would be very unlikely to break any pages. As described above, any such writes are virtually guaranteed to be a race condition and would make such content appear on random places in the page. Thus it seems very unlikely that pages would be doing that and so it seems safe to change. As an implementer I would definitely be willing to try to make such a change if it simplifies the implementation, which I think would be the case. / Jonas
Received on Monday, 6 April 2009 07:40:13 UTC