Re: Why is querySelector much slower? from Boris Zbarsky on 2015-04-28 (public-webapps@w3.org from April to June 2015)

From: Boris Zbarsky <bzbarsky@mit.edu>
Date: Tue, 28 Apr 2015 12:16:13 -0400
To: Glen Huang <curvedmark@gmail.com>
CC: public-webapps <public-webapps@w3.org>
Message-ID: <553FB24D.2010501@mit.edu>
On 4/28/15 2:59 AM, Glen Huang wrote:
> Looking at the microbenchmark again, for Gecko, getElementById is around 300x faster than querySelector('#id'), and even getElementsByClassName is faster than it.

This is why one should not write microbenchmarks.  ;)  Or at least if 
one does, examine the results very carefully.

The numbers you see for the getElementById benchmark there are on the 
order of 2e9 operations per second, yes?  And modern desktop/laptop CPUs 
are clocked somewhere in the 2-4 GHz range.  So what you're seeing is 
that the benchmark claims the operation is performed in 1-2 clock 
cycles.  This should seem unlikely if you think the operation involves a 
hashtable lookup!

What's happening there is that Gecko happens to know at JIT compile time 
in this microbenchmark:

1)  The bareword lookup is going to end up at the global, because there 
is nothing on the scope chain that would shadow the "document" name.
2)  The global has an own property named "document" whose getter is 
side-effect-free.
3)  The return value of the "document" property has only been observed 
to be a Document.
4)  Looking up "getElementById" on the return value of the "document" 
property has consistently found it on Document.prototype.
5)  Document.prototype.getElementById is known to be side-effect-free.
6)  The return value of getElementById is not used (assigned to a 
function-local variable that is then not used).

The upshot of all that is that with a few guards both the 
"getElementById" call get and the  "document" get can be dead-code 
eliminated here.  And even if you stored the value somewhere persistent 
they could both still be loop-hoisted in this case.  So what this 
getElementById benchmark measures is how fast a loop counter can be 
decremented from some starting value to 0.  It happens that this can be 
done in about 1-2 clock cycles per loop iteration.

OK, so what about querySelector("#id") vs getElementsByClassName?

In the former case, loop-hoisting and dead code elimination are 
disallowed because querySelector can throw.  That means that you can't 
eliminate it, and you can't move it past other things that might have 
observable side effects (like the counter increment).  Arguably this is 
a misfeature in the design of querySelector.

In the latter case, loop-hoisting or dead code elimination can't happen 
because Gecko doesn't know enough about what [0] will do so assumes the 
worst: that it can have side-effects that can affect what the "document" 
getter returns as well as what the getElementsByClassName() call returns.

So there are no shortcuts here; you have to actually do the calls. What 
do those calls do?

querySelector does a hashtable lookup for the selector to find a parsed 
selector.  Then it sets up some state that's needed for selector 
matching.  Then it detects that the selector's right-hand-most bit has a 
simple ID selector and does a fast path that involves looking up that id 
in the hashtable and then comparing the selector to the elements that 
are returned until one of them matches.

getElementsByClassName has to do a hashtable lookup on the class name, 
then return the result.  Then it has to do the [0] (which is actually 
surprisingly expensive, by the way, because of the proxy machinery 
involved on the JS engine side).

So we _could_ make querySelector faster here by adding another special 
case for selectors that are _just_ an id as opposed to the existing 
optimization (which works for "#foo > #bar" and similar as well).  And 
of course the new special case would only work the way you want for 
document.querySelector, not element.querySelector; the latter needs to 
check for your result being a descendant of the element anyway.  It's a 
tradeoff between complexity of implementation (which has its own 
maintenance _and_ performance costs) and real-life use cases.

Lastly, I'd like to put numbers to this.  On this particular testcase, 
the querySelector("#list") call takes about 100ns on my hardware: about 
300 CPU cycles.  We could add that other set of special-casing and get 
it down to 70ns (I just checked by implementing it, so this is not a 
random guess).  At that point you've got two hashtable lookups (which we 
could try to make faster, perhaps), the logic to detect that the 
optimization can be done at all (which is not that trivial; our selector 
representation requires a bunch of checks to ensure that it's just an id 
selector), and whatever work is involved in the binding layer.  In this 
case, those all seem to have about the same cost; about 17-18ns (50 CPU 
cycles) each.

So is your use case one where the difference between querySelector 
costing 100ns and it costing 70ns actually makes a difference?

> It doesn't look like it benefits much from an eagerly populated hash table?

It benefits a good bit for non-toy documents where avoiding walking the 
entire DOM is the important part of the optimization.  Again, 
microbenchmarks mostly serve to highlight the costs of constant-time 
overhead, which is a useful thing to do, as long as you know that's what 
you're doing.  But for real-life testcases algorithmic complexity can 
often be much more important.

-Boris

P.S.  Other engines make different complexity/speed tradeoffs here; for 
example Safari's querySelector with ".foo" and "#list" is about 60ns on 
my hardware.  I know they have extra fast paths for both those cases.
Received on Tuesday, 28 April 2015 16:16:46 UTC