- From: Boris Zbarsky <bzbarsky@mit.edu>
- Date: Tue, 28 Apr 2015 12:16:13 -0400
- To: Glen Huang <curvedmark@gmail.com>
- CC: public-webapps <public-webapps@w3.org>
On 4/28/15 2:59 AM, Glen Huang wrote: > Looking at the microbenchmark again, for Gecko, getElementById is around 300x faster than querySelector('#id'), and even getElementsByClassName is faster than it. This is why one should not write microbenchmarks. ;) Or at least if one does, examine the results very carefully. The numbers you see for the getElementById benchmark there are on the order of 2e9 operations per second, yes? And modern desktop/laptop CPUs are clocked somewhere in the 2-4 GHz range. So what you're seeing is that the benchmark claims the operation is performed in 1-2 clock cycles. This should seem unlikely if you think the operation involves a hashtable lookup! What's happening there is that Gecko happens to know at JIT compile time in this microbenchmark: 1) The bareword lookup is going to end up at the global, because there is nothing on the scope chain that would shadow the "document" name. 2) The global has an own property named "document" whose getter is side-effect-free. 3) The return value of the "document" property has only been observed to be a Document. 4) Looking up "getElementById" on the return value of the "document" property has consistently found it on Document.prototype. 5) Document.prototype.getElementById is known to be side-effect-free. 6) The return value of getElementById is not used (assigned to a function-local variable that is then not used). The upshot of all that is that with a few guards both the "getElementById" call get and the "document" get can be dead-code eliminated here. And even if you stored the value somewhere persistent they could both still be loop-hoisted in this case. So what this getElementById benchmark measures is how fast a loop counter can be decremented from some starting value to 0. It happens that this can be done in about 1-2 clock cycles per loop iteration. OK, so what about querySelector("#id") vs getElementsByClassName? In the former case, loop-hoisting and dead code elimination are disallowed because querySelector can throw. That means that you can't eliminate it, and you can't move it past other things that might have observable side effects (like the counter increment). Arguably this is a misfeature in the design of querySelector. In the latter case, loop-hoisting or dead code elimination can't happen because Gecko doesn't know enough about what [0] will do so assumes the worst: that it can have side-effects that can affect what the "document" getter returns as well as what the getElementsByClassName() call returns. So there are no shortcuts here; you have to actually do the calls. What do those calls do? querySelector does a hashtable lookup for the selector to find a parsed selector. Then it sets up some state that's needed for selector matching. Then it detects that the selector's right-hand-most bit has a simple ID selector and does a fast path that involves looking up that id in the hashtable and then comparing the selector to the elements that are returned until one of them matches. getElementsByClassName has to do a hashtable lookup on the class name, then return the result. Then it has to do the [0] (which is actually surprisingly expensive, by the way, because of the proxy machinery involved on the JS engine side). So we _could_ make querySelector faster here by adding another special case for selectors that are _just_ an id as opposed to the existing optimization (which works for "#foo > #bar" and similar as well). And of course the new special case would only work the way you want for document.querySelector, not element.querySelector; the latter needs to check for your result being a descendant of the element anyway. It's a tradeoff between complexity of implementation (which has its own maintenance _and_ performance costs) and real-life use cases. Lastly, I'd like to put numbers to this. On this particular testcase, the querySelector("#list") call takes about 100ns on my hardware: about 300 CPU cycles. We could add that other set of special-casing and get it down to 70ns (I just checked by implementing it, so this is not a random guess). At that point you've got two hashtable lookups (which we could try to make faster, perhaps), the logic to detect that the optimization can be done at all (which is not that trivial; our selector representation requires a bunch of checks to ensure that it's just an id selector), and whatever work is involved in the binding layer. In this case, those all seem to have about the same cost; about 17-18ns (50 CPU cycles) each. So is your use case one where the difference between querySelector costing 100ns and it costing 70ns actually makes a difference? > It doesn't look like it benefits much from an eagerly populated hash table? It benefits a good bit for non-toy documents where avoiding walking the entire DOM is the important part of the optimization. Again, microbenchmarks mostly serve to highlight the costs of constant-time overhead, which is a useful thing to do, as long as you know that's what you're doing. But for real-life testcases algorithmic complexity can often be much more important. -Boris P.S. Other engines make different complexity/speed tradeoffs here; for example Safari's querySelector with ".foo" and "#list" is about 60ns on my hardware. I know they have extra fast paths for both those cases.
Received on Tuesday, 28 April 2015 16:16:46 UTC