Re: [SelectorsAPI] Thoughts on querySelectorAll from Boris Zbarsky on 2008-05-03 (public-webapi@w3.org from May 2008)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 02 May 2008 20:08:24 -0500
To: John Resig <jresig@mozilla.com>
CC: public-webapi@w3.org
Message-ID: <481BBB08.6010407@mit.edu>
John Resig wrote:
>> The problem is defining this concept of "what selector is failing".
>
> There's a solution to the above that is both simple and intuitive:
> The first character of the selector that the UA doesn't know what to
> do with.

This requires a definition of "doesn't know what to do with".  That's 
far from an obvious concept to me.  It's certainly not something that 
different UAs would define the same way.

Maciej Stachowiak already covered this ground in his mail, and I agree 
with pretty much everything he said on the subject.  I just want to 
point out that:

> For example, in the above, it would be:
> 
> :not(a:link)
 >       ^

Assuming you're marking the first character that has to have been read 
from the character stream to realize that this is an invalid selector, 
in Gecko this would be correct.

> :not|test|
 > ^

Under the same assumption, this would not be correct.  The first 
character that would need to be read would be the first ':'.

 > :note
 > ^

Again, this would not be correct: the first character would be the 'e' 
(not that we keep track of that sort of thing, I might add).

> You just have to ask yourself "am I use this expression?"

Define "use", please?  This is the hard part, which you're sweeping 
under the rug by just using a different name for the same concept, as 
far as I can tell.

>> See the example above:  :not(a:link).  In CSS3 Selectors, what's 
>> invalid is the concept of putting multiple simple selectors inside
>> :not.  All the parts are valid on their own; it's the way of
>> putting them together that's invalid.
> 
> Sure - and if that was the error then it would throw the exception
> pointing to the first : - at which point the library would take over.

You just said that it should point to the second ':' above....

>> But maybe I'm misunderstanding your real concern here. Above you
>> say that :not would be handled by backing out of it anyway, which
>> makes sense to me.  At that point, why do you care what inside the
>> :not is invalid, exactly?  You're going to have to do the entire
>> :not match yourself anyway....
> 
> Well, I forgot that CSS selectors are completely crippled for
> real-world use, so it's kind of a moot point.

That doesn't answer my question (ignoring for the moment that if you 
think that a useful querying API would simply not be based on CSS 
Selectors at all and that the spec should just be scrapped completely, 
you should just say so).  I'm not asking these questions just because I 
want to annoy you; I want to understand what your requirements for this 
API are.

> Since there's only one selector it would be easy to detect, in that case.

:not() is hardly the only "function" pseudo-class out there, though the 
others tend to be either proposals or UA extensions.   I wouldn't rely 
on the "only one selector" thing staying true.

>> div, :future(a, b, c), :bad, span
> 
> Well, in JavaScript libraries, at least the parsing is smart enough
> to handle situations like that.

Ah, good.  That was not clear from what you said earlier.

> Of course "just splitting on ," wouldn't work - all the JS selector
> engines that I know tokenize and break it apart. In the above the
> error would occur here:
> 
> div, :not( , span
 >      ^

In Gecko, per the criterion I gave above, it occurs at the second ','.

>> Maybe I'm missing something.  Adding more error-reporting to the 
>> thrown exception is a backwards-compatible change to an
>> implementation.  If, say, Firefox 4 ships querySelectorAll()
>> without such error reporting, that would not preclude Firefox 5
>> adding it.
> 
> The important thing here is that this would work if - and only if -
> there was proper non-black-box error reporting to begin with.

I seem to be failing to communicate here.  Let me assume that the issue 
is on my end and try code instead of prose:

function myQuerySelectorAll(selector, node) {
   if (node.querySelectorAll) {
     try {
       return node.querySelectorAll(selector);
     } catch (e) {
       if (e.hasErrorLocationInformation()) {
         return doSomethingSmart(e, selector, node);
       }
     }
   }

   return handleSelectorMyself(selector, node);
}

This code handles UAs that do not implement querySelectorAll at all (via 
handleSelectorMyself()), UAs that implement it but don't report anything 
useful about the error location (via handleSelectorMyself()), and UAs 
that report useful error location information (via doSomethingSmart). 
It would be pretty easy to cache various aspects of things here (that 
particular selectors are not supported, etc) if needed.  What do you 
think is wrong with this approach?

>> What's the time it usually takes :hidden to actually match in
>> jquery in similar circumstances?
> 
> There's three points here: 1) Under no circumstance should repeated
> calls to a DOM method be an acceptable solution to attempting to
> figure out what it's trying to tell you.

I'm not sure I follow this point...  What is "it's trying to tell you" 
in this case?  Where do repeated calls come in?

 > Showing that it's fast in
> Mozilla or fast in WebKit is fine - but will it be faster in another
> browser, or a server-side implementation? Ignoring a usability
> problem because it's currently "fast enough for me" seems like such a
> strange way to tackle the issue.

I guess I'm not quite understanding the usability problem you refer to. 
  If you can expand on that, I would appreciate it.

As I understood the situation, one concern is that trying 
querySelectorAll and then falling back on handleSelectorMyself() as 
above is slow, right?  But "slow" is not an absolute measurement.  Any 
time that happens, the right questions are "how slow?", "how much does 
it matter?", "what are the costs of making it faster?".  It would be 
good to get a handle on those answers.

> 2) The important point is that developers - and libraries - want to
> be using this method to avoid doing traditional DOM traversal
> wherever possible. The time that's trying to be avoided isn't,
> necessarily, the extra calls to the querySelectorAll method

Ah, I see.  That was not clear, at all.

> it's any extra DOM traversal whatsoever. If a library can call
> querySelectorAll, determine which part of the selector is valid
> (using this positional index), it can get as much speed benefit as
> possible while falling back wherever it can.

I understand the sentiment, and I agree that it would be very good to 
provide some way to do that.  I'm just not sure that the index approach 
is the right one: it's a lot of work for libraries _and_ UAs, and only 
lets one avoid DOM traversals in some simple (but not too simple!). 
Perhaps it would be possible to find a better solution to this problem 
that handles selectors like "valid, invalid, valid" better, for example.

But as above, I feel that this can be done even on top of shipping 
implementations.   It just seems to me that libraries can get immediate 
benefits out of the spec even as it stands now, while we continue to 
figure out a way to improve on the error handling aspect of things... 
You seem to be saying that this is not the case, and that the spec as it 
is right this second is completely useless for jQuery's purposes, say. 
Or am I misunderstanding your position?

> 3) There's another category that I forgot to bring up: IDEs (think:
> Firebug, Aptana, Companion.JS, Drosera, etc.). If a user is writing a
> selector and gets the magical "YOU DID SOMETHING BAD HERE" error
> it'll be up to the IDE to write a full selector parsing engine just
> to (maybe) determine what went wrong.

Indeed, but again it seems like supporting this use case is additional 
functionality that would be nice to have but isn't a prerequisite for 
shipping this API: whenever it's added, it can start being used.

> The issue here is that having them come along in UAs "later" (even if
> it's a couple months) will do no good since old UAs will persist with
> developers indefinitely.

Yes... but my point is that there seems to be a strict ordering of 
usefulness here:

   UA with error location reporting >
   UA without reporting >
   UA without any querySelector(All) support

Are you arguing that just having to support those UAs in the middle at 
all isn't worth the benefits they bring?  Or something else?

> Think of it another way: Any UA that doesn't provide this data will
> look really bad in competitive performance analysis

It'll look better than UAs that don't implement this spec at all, right?

> By providing this information you'll be guaranteeing users the best experience
> possible and, in the end, isn't that what's it's really all about?

Again, I'm not against providing useful information to make this better 
for libraries.  I just don't want us to rush in and standardize the 
index approach as THE information that is returned, without taking a bit 
to see whether we can do better.  It seems that the cost (slightly 
delayed implementation of the error reporting, maybe) might be worth the 
benefit (better error reporting).

> The cases where it would help the most are the "long tail" of
> queries:

OK, that was also my instinct.  Good to know.

>> You've said this repeatedly, but in the end the worst-case scenario
>>  for a library is that it falls back on exactly what it does now
>> after trying querySelectorAll and seeing that it fails.  That's not
>> the same as "completely stuck", by a long shot....
> 
> No? But if a brand new API for making existing development, and
> libraries, faster is incapable of even providing that basic level of
> service - wouldn't that be considered a failure?

Let me turn this question around:  Does the API as it exists right now 
speed up nothing in your list of queries?  Does it slow some of them 
down?  What is the average gain (or loss)?

"success" or "failure" are not binary terms.  I'm saying we should 
improve things, but shouldn't suffer from the MIT school of design here 
unless we have to.  And I don't think we have to, because it seems to me 
that the cost of the New Jersey approach here is small.  Maybe I'm 
underestimating the cost.

> I'm not even approaching the subject of adding in custom selector
> expressions - something that would be genuinely useful - as I assume
> that it would be a quagmire of specification.
 >
> document.addQuerySelector(":hidden", function(elem){ return
> isHidden(elem); });

This might be easier to do in Gecko in some ways than doing the position 
index, for what it's worth.

-Boris
Received on Saturday, 3 May 2008 01:09:15 UTC