Re: [SelectorsAPI] Thoughts on querySelectorAll from John Resig on 2008-05-02 (public-webapi@w3.org from May 2008)

From: John Resig <jresig@mozilla.com>
Date: Fri, 2 May 2008 14:15:33 -0700 (PDT)
To: Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: public-webapi@w3.org
Message-ID: <8450034.53691209762933634.JavaMail.root@cm-mail02.mozilla.org>
> The problem is defining this concept of "what selector is failing".
> 
> For example, consider the following selector:
> 
>    :not(a:link)
> 
> In Gecko, the character where we fail is the second ':', I think. 
> Which 
> selector is failing exactly?  And while this case can be handed with
> the 
> "back up to the preceding '(' and then to the preceding ':'"
> suggestion, 
> what about:
> 
>    :not|test|
> 
> ? Here the failing character is the first '|' but there is just no way
> 
> to extract a valid selector out of the whole thing.  Worse yet, what
> about:
> 
>    :note
> 
> in one of the UAs supporting :not?  Is the failing character the ':'
> or 
> the 'e'?  I realize that for any particular case like this it's clear
> 
> what should be happening.  What I question is that we can easily
> create 
> a rigorous definition of where the error character pointer should
> point 
> that works across all the various existing selectors and does
> reasonable 
> things for currently-invalid selectors.  That doesn't even include 
> future-proofing issues, more on which below.

There's a solution to the above that is both simple and intuitive: The first character of the selector that the UA doesn't know what to do with. For example, in the above, it would be:

 :not(a:link)
       ^
 :not|test|
 ^
 :note
 ^

You just have to ask yourself "am I use this expression?" if not, then it's an error and that's where you mark it. It's inconsequential that the useragent supports :not in the :not|test| example since not enough information is provided to actually use it.

> See the example above:  :not(a:link).  In CSS3 Selectors, what's
> invalid 
> is the concept of putting multiple simple selectors inside :not.  All
> the parts are valid on their own; it's the way of putting them
> together 
> that's invalid.

Sure - and if that was the error then it would throw the exception pointing to the first : - at which point the library would take over.

> But maybe I'm misunderstanding your real concern here. Above you say 
> that :not would be handled by backing out of it anyway, which makes 
> sense to me.  At that point, why do you care what inside the :not is 
> invalid, exactly?  You're going to have to do the entire :not match 
> yourself anyway....

Well, I forgot that CSS selectors are completely crippled for real-world use, so it's kind of a moot point. Since there's only one selector it would be easy to detect, in that case.

>    div, :future(a, b, c), :bad, span
> 
> How is this handled in a UA that supports :future?  How is it handled
> in 
> a UA that does not?

Well, in JavaScript libraries, at least the parsing is smart enough to handle situations like that. In jQuery we support selectors like:

  :not(a, div, span)

Just fine. If a UA doesn't support it then I would assume that it would throw on the first character of the first thing that it doesn't know how to use:

  div, :future(a, b, c), :bad, span
       ^

> Just splitting on ',' is not quite right....  For
> that matter, how would you handle:
> 
>    div, :not( , span
> 
> ?  Just splitting on the ',' gives very different results from what 
> querySelectorAll() would return (which is an exception).

Of course "just splitting on ," wouldn't work - all the JS selector engines that I know tokenize and break it apart. In the above the error would occur here:

  div, :not( , span
       ^

> > This really must be done *now* before implementations get too
> baked.
> 
> Maybe I'm missing something.  Adding more error-reporting to the
> thrown 
> exception is a backwards-compatible change to an implementation.  If,
> say, Firefox 4 ships querySelectorAll() without such error reporting,
> that would not preclude Firefox 5 adding it.

The important thing here is that this would work if - and only if - there was proper non-black-box error reporting to begin with. The fact that useragents were allowed to begin implementing without taking into consideration fail cases (and how to determine fail cases) is a serious issue.

> What's the time it
> usually takes :hidden to actually match in jquery in similar
> circumstances?

There's three points here:
1) Under no circumstance should repeated calls to a DOM method be an acceptable solution to attempting to figure out what it's trying to tell you. Showing that it's fast in Mozilla or fast in WebKit is fine - but will it be faster in another browser, or a server-side implementation? Ignoring a usability problem because it's currently "fast enough for me" seems like such a strange way to tackle the issue.

2) The important point is that developers - and libraries - want to be using this method to avoid doing traditional DOM traversal wherever possible. The time that's trying to be avoided isn't, necessarily, the extra calls to the querySelectorAll method it's any extra DOM traversal whatsoever. If a library can call querySelectorAll, determine which part of the selector is valid (using this positional index), it can get as much speed benefit as possible while falling back wherever it can.

3) There's another category that I forgot to bring up: IDEs (think: Firebug, Aptana, Companion.JS, Drosera, etc.). If a user is writing a selector and gets the magical "YOU DID SOMETHING BAD HERE" error it'll be up to the IDE to write a full selector parsing engine just to (maybe) determine what went wrong. Including the position will actually make it feasible for developers to write these selectors and see what went wrong.

> Right now we
> have 
> two more-or-less interoperable implementations of the specification as
> written (one in beta, one released), with at least one more on the way
> soon.  Other implementors may be waiting for the spec to go to CR
> before 
> implementing.  I certainly would have if it had seemed like major 
> changes were going to still happen to the specification.  So the real
> question I have is whether it's better to have querySelectorAll
> without 
> the extra error reporting in UAs 6 months from now and then add error
> reporting another 12 months after that or whether it's better to have
> querySelectorAll with extra error reporting in UAs but not until 12 
> months from now (and nothing before that).

The issue here is that having them come along in UAs "later" (even if it's a couple months) will do no good since old UAs will persist with developers indefinitely. WebKit can (assumedly) move fast and push out an update to rectify this (and the other changes that I'm discussing) and since Internet Explorer 8 is still in beta, it's definitely not too late to fix this.

Think of it another way: Any UA that doesn't provide this data will look really bad in competitive performance analysis - since libraries will always be forced to run at the slower DOM level. By providing this information you'll be guaranteeing users the best experience possible and, in the end, isn't that what's it's really all about?

> After that you start getting into some selectors where you might get 
> more savings (I'm looking at ".class .class tag:gt(2)"), but these are
> 
> not that common on an absolute basis compared to the more-common
> things 
> discussed above, and there are lots of red things with similar 
> frequencies to ".class .class tag:gt(2)" that fall into one of the 
> categories above.
> 
> Again, I agree there are cases where the index would help.  I'm just 
> wondering whether they're common enough, and whether the help is 
> noticeable in those cases in terms of performance, and whether it's
> easy 
> to tell those cases apart from the cases when the index just doesn't 
> help much.

The cases where it would help the most are the "long tail" of queries: Ones that are particularly long (like the above ".class .class tag:gt(2)") and very difficult to optimize, otherwise. Obviously libraries will be trying to take shortcuts wherever possible so queries like "#foo:hidden" may never hit querySelectorAll in the first place.

> You've said this repeatedly, but in the end the worst-case scenario
> for 
> a library is that it falls back on exactly what it does now after
> trying 
> querySelectorAll and seeing that it fails.  That's not the same as 
> "completely stuck", by a long shot....

No? But if a brand new API for making existing development, and libraries, faster is incapable of even providing that basic level of service - wouldn't that be considered a failure?

I'm not even approaching the subject of adding in custom selector expressions - something that would be genuinely useful - as I assume that it would be a quagmire of specification.

  document.addQuerySelector(":hidden", function(elem){
    return isHidden(elem);
  });

--John
Received on Friday, 2 May 2008 21:16:17 UTC