Re: Allow to return same NodeList object for queries like getElementsByTagName, getElementsByClassName and getElementsByName

On Sat, 13 Feb 2010, Boris Zbarsky wrote:
> On 2/13/10 6:18 AM, Ian Hickson wrote:
> > I'm concerned about the GC-sensitivity of such behaviour (we might end 
> > up snookering ourselves in a situation where specific GC behaviour 
> > actually matters for compatibility).
> 
> We haven't gotten there yet, in 8 years of two different implementations 
> having this behavior....
> 
> > How about the following compromise: these methods return a new object 
> > each time, except if they are called with the same argument as the 
> > previous invocation of the method? i.e. cache the last returned object 
> > only. Would that be acceptable?
> 
> As far as I can tell, no.  That formulation makes it impossible to drop 
> the cache as needed, effectively leading to a memory leak if no one 
> calls these methods for a while.
> 
> > Alternatively, if we need to cache more than that, how about blowing 
> > away the cache with each spin of the event loop, so that anything in a 
> > tight loop is cached (and _not_ subject to GC — this could be a 
> > problem if the script calls one of these methods with 10000 different 
> > arguments and sets properties on each one)
> 
> Indeed.  Especially because these objects can end up taking up a fair 
> amount of memory...
> 
> > but not beyond one task? (i.e. don't share objects in calls across 
> > setTimeout)
> 
> Those two are not quite the same condition.  Implementing the latter in 
> Gecko would not be too bad.  Implementing the former, assuming I 
> understand what you mean correctly, would require totally rewriting the 
> Gecko event loop.

On Sat, 13 Feb 2010, Maciej Stachowiak wrote:
> On Feb 13, 2010, at 3:18 AM, Ian Hickson wrote:
> > 
> > I'm concerned about the GC-sensitivity of such behaviour (we might end 
> > up snookering ourselves in a situation where specific GC behaviour 
> > actually matters for compatibility).
> 
> It's not GC that matters but the degree of caching (e.g. whether cache 
> items are ever cleared for reasons other than GC). It's true that this 
> is theoretically a hazard, but the only observable effect would be 
> whether custom properties set on one NodeList appear on one retrieved 
> later. Since it's very uncommon (and indeed unlikely) for authors to set 
> custom properties on NodeLists, I think this benefit is purely 
> theoretical, not real.
> 
> 
> > How about the following compromise: these methods return a new object 
> > each time, except if they are called with the same argument as the 
> > previous invocation of the method? i.e. cache the last returned object 
> > only. Would that be acceptable? It gives you a performance win in the 
> > case where the author spins a loop using the same call over and over, 
> > and is completely predictable.
> 
> It's only predictable if that last object is kept alive, even if it were 
> otherwise a candidate for garbage collection. Are you suggesting to do 
> that? I assume so, because that's the only way it would be "completely 
> predictable". If so, then I would object, because it could lead to a 
> large long-term memory cost (fully traversing a large NodeList in a loop 
> would leave you paying the cost of that memory until you leave the page 
> or the author fetches a different NodeList). Imagine the last NodeList 
> you accessed was the result of getElementsByTagName("*") and the author 
> fully traversed it. Now you've likely pinned memory proportional to the 
> size of the DOM.
> 
> Even without the memory issue, I would not favor this design, because it 
> makes performance fall off a cliff if you use more than one NodeList. 
> Changing your loop from fetching one NodeList to two could suddenly make 
> it 50x slower. We do not like coding performance hazards like this into 
> our implementation.
> 
> 
> > Alternatively, if we need to cache more than that, how about blowing 
> > away the cache with each spin of the event loop, so that anything in a 
> > tight loop is cached (and _not_ subject to GC — this could be a 
> > problem if the script calls one of these methods with 10000 different 
> > arguments and sets properties on each one), but not beyond one task? 
> > (i.e. don't share objects in calls across setTimeout)
> 
> Pinning a potentially unbounded number of NodeLists in memory would 
> definitely be unacceptable from both speed and memory perspectives. 
> Especially on mobile devices.
> 
> 
> I note that if all you care about is ensuring that behavior is 
> deterministic, the simplest solution would be to make NodeList objects 
> disallow setting of custom properties. Then there is no way to observe 
> the side effects of GC behavior. This would be simpler to implement than 
> either of your proposed rules, and would not create speed or memory 
> hazards. I do not know if we could justify such a change as a mere 
> erratum to DOM Level 3 Core, but the same goes for both your proposed 
> policies.
> 
> 
> > For now for the objects in HTML5 I've gone with the first of these 
> > suggested compromises.
> 
> I don't think we'd be willing to implement that in WebKit. We're more 
> likely to copy existing Firefox and IE behavior.

On Sat, 13 Feb 2010, Anton Muhin wrote:
>
> For me performance-wise both approaches seem fine, but to get numbers I 
> need to run an experiment.
>
> My main concern would be that rules are overcomplicated imho.  And I 
> won't be surprised if IE and FF would just ignore them.  But you 
> understand all those matters better than I do.
>
> BTW, as the proposed change would reduce stress on GC, I don't quite see 
> how this could reduce compatibility (but life is often more complicated 
> than I can imagine.)

I've changed the HTML5 spec to allow arbitrary caching for the various 
objects in HTML5 instead.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 17 March 2010 23:23:14 UTC