Re: Why is querySelector much slower? from Glen Huang on 2015-04-28 (public-webapps@w3.org from April to June 2015)

From: Glen Huang <curvedmark@gmail.com>
Date: Tue, 28 Apr 2015 11:27:51 +0800
To: Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: public-webapps <public-webapps@w3.org>
Message-Id: <9AF27EB5-BB94-4A76-AB2E-1B7DCF706A4B@gmail.com>
Thank you for the sample code. It's very helpful.

When you say "var node = list[0];" walks the DOM until the first item is found, do you mean it only happens under the condition that some previous code has changed the DOM structure? If not, the returned list object will be marked as up-to-day, and accessing the first element is very cheap? I ask because in the first paragraph you said the returned list and returned first element is probably precomputed.

Also, this is my mental model after reading your explanation, I wonder if that's correct:

After UA has parsed html, it caches a hash table of elements with class names (also all element with ids, all elements with tag names, etc in different hash tables), keyed under the class names. When getElementsByClassName() is called, and the DOM hasn't been modified, it simply creates a list of elements with that class name from the hash table. When the first element is accessed from that list, and the DOM still isn't modified, the element is returned directly.

The hash table is kept in sync with the DOM when it's modified. And if the DOM is changed after the list is returned but before it's accessed, the list will be masked as dirty, and accessing its element will walk the DOM (and mark the list as partially updated after that).

Is this description correct?

And the final question:

Why can't querySelector benefit from these hash tables? I currently feel the urge to optimize it myself by overriding it with a custom function which will parse the passed selector, and if it's a simple selector like "div", ".class", "#id", call the corresponding getElement*() function instead. Why can't UAs perform this for us?

If my mental model is correct, it's simpler than getElement*() from an UA's point of view. It simply needs to lookup the first matching element from the hash table and return it, no need to return a list and mark it as clean or dirty any more. The only price it  pays is parsing the selector.

Is it because authors don't use querySelector often enough that UAs aren't interested in optimizing it?

> On Apr 27, 2015, at 9:51 PM, Boris Zbarsky <bzbarsky@MIT.EDU> wrote:
> 
> On 4/27/15 4:57 AM, Glen Huang wrote:
>> Intuitively, querySelector('.class') only needs to find the first
>> matching node, whereas getElementsByClassName('.class')[0] needs to find
>> all matching nodes
> 
> Not true; see below.
> 
>> and then return the first. The former should be a lot
>> quicker than the latter. Why that's not the case?
>> 
>> See http://jsperf.com/queryselectorall-vs-getelementsbytagname/119 for
>> the test
> 
> All getElementsByClassName(".foo") has to do in a microbenchmark like this is look up a cached list (probably a single hashtable lookup) and return its first element (likewise precomputed, unless you're modifying the DOM in ways that would affect the list).  It doesn't have to walk the tree at all.
> 
> querySelector(".foo"), on the other hand, probably walks the tree at the moment in implementations.
> 
> Also, back to the "not true" above: since the list returned by getElementsBy* is live and periodically needs to be recomputed anyway, and since grabbing just its first element is a common usage pattern, Gecko's implementation is actually lazy (see https://bugzilla.mozilla.org/show_bug.cgi?id=104603#c0 for the motivation): it will only walk as much of the DOM as needed to reply to the query being made.  So for example:
> 
>  // Creates a list object, doesn't do any walking of the DOM, marks
>  // object as dirty and returns it.
>  var list = document.getElementsByClassName(".foo");
> 
>  // Walks the DOM until it finds the first element of the list, marks
>  // the list as "partially updated", and returns that first element.
>  var node = list[0];
> 
>  // Marks the list as dirty again, since the set of nodes it matches
>  // has changed
>  document.documentElement.className = "foo";
> 
> I can't speak for what other UAs here, but the assumption that getElementsByClassName('.class')[0] needs to find all matching nodes is just not true in Gecko.
> 
> -Boris
Received on Tuesday, 28 April 2015 03:28:26 UTC