Re: Allow to return same NodeList object for queries like getElementsByTagName, getElementsByClassName and getElementsByName from Maciej Stachowiak on 2010-02-14 (public-webapps@w3.org from January to March 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 13 Feb 2010 16:34:46 -0800
To: Ian Hickson <ian@hixie.ch>
Cc: Anton Muhin <antonm@chromium.org>, public-webapps@w3.org
Message-id: <9FEA5D96-34C8-42AA-AAB0-F73022245988@apple.com>
On Feb 13, 2010, at 3:18 AM, Ian Hickson wrote:

> On Fri, Jan 22, 2010 at 5:11 AM, Anton Muhin <antonm@chromium.org>  
> wrote:
>> Good day.
>>
>> Currently DOM core 3 spec is somewhat inconsistent regarding if
>> invocations of getElementsByTagName and alike must return a new
>> NodeList or could cache this list.  For Document it's mandated for
>> both getElementsByTagName and getElementsByTagNameNS, but for  
>> Element,
>> it's only worded for getElementsByTagNameNS, but not for
>> getElementsByTagName.  Maciej noticed as well difference between
>> getElementsByTagName and other getElementsBy queries (see
>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=8792).  And word "new"
>> is missing from ECMAScript bindings spec:
>> http://www.w3.org/TR/DOM-Level-3-Core/ecma-script-binding.html
>>
>> Is it possible to allow caching for those cases?  Firefox caches  
>> those
>> node lists for a long time (Maciej found the related bug
>> https://bugzilla.mozilla.org/show_bug.cgi?id=140758).  IE8 caches as
>> well.   Opera, Safari and Chrome do not.
>
> I'm concerned about the GC-sensitivity of such behaviour (we might end
> up snookering ourselves in a situation where specific GC behaviour
> actually matters for compatibility).

It's not GC that matters but the degree of caching (e.g. whether cache  
items are ever cleared for reasons other than GC). It's true that this  
is theoretically a hazard, but the only observable effect would be  
whether custom properties set on one NodeList appear on one retrieved  
later. Since it's very uncommon (and indeed unlikely) for authors to  
set custom properties on NodeLists, I think this benefit is purely  
theoretical, not real.


> How about the following compromise: these methods return a new object
> each time, except if they are called with the same argument as the
> previous invocation of the method? i.e. cache the last returned object
> only. Would that be acceptable? It gives you a performance win in the
> case where the author spins a loop using the same call over and over,
> and is completely predictable.

It's only predictable if that last object is kept alive, even if it  
were otherwise a candidate for garbage collection. Are you suggesting  
to do that? I assume so, because that's the only way it would be  
"completely predictable". If so, then I would object, because it could  
lead to a large long-term memory cost (fully traversing a large  
NodeList in a loop would leave you paying the cost of that memory  
until you leave the page or the author fetches a different NodeList).  
Imagine the last NodeList you accessed was the result of  
getElementsByTagName("*") and the author fully traversed it. Now  
you've likely pinned memory proportional to the size of the DOM.

Even without the memory issue, I would not favor this design, because  
it makes performance fall off a cliff if you use more than one  
NodeList. Changing your loop from fetching one NodeList to two could  
suddenly make it 50x slower. We do not like coding performance hazards  
like this into our implementation.


> Alternatively, if we need to cache more than that, how about blowing
> away the cache with each spin of the event loop, so that anything in a
> tight loop is cached (and _not_ subject to GC — this could be a
> problem if the script calls one of these methods with 10000 different
> arguments and sets properties on each one), but not beyond one task?
> (i.e. don't share objects in calls across setTimeout)

Pinning a potentially unbounded number of NodeLists in memory would  
definitely be unacceptable from both speed and memory perspectives.  
Especially on mobile devices.


I note that if all you care about is ensuring that behavior is  
deterministic, the simplest solution would be to make NodeList objects  
disallow setting of custom properties. Then there is no way to observe  
the side effects of GC behavior. This would be simpler to implement  
than either of your proposed rules, and would not create speed or  
memory hazards. I do not know if we could justify such a change as a  
mere erratum to DOM Level 3 Core, but the same goes for both your  
proposed policies.


> For now for the objects in HTML5 I've gone with the first of these
> suggested compromises.

I don't think we'd be willing to implement that in WebKit. We're more  
likely to copy existing Firefox and IE behavior.

Regards,
Maciej
Received on Sunday, 14 February 2010 00:35:21 UTC