- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Thu, 01 May 2008 12:51:31 -0500
- To: public-webapi@w3.org
John Resig wrote: >> So say the original |selector| is ":not(:link)" and the UA doesn't >> support :link. This will presumably split the string into ":not(" >> and ":link)", neither of which is that useful. > > Absolutely - however that's a semi-trivial check for the library > (just look for an open :FOO(... before beginning to parse). What's > important about the position is that we can determine, > programmatically, WHAT selector is failing. The problem is defining this concept of "what selector is failing". For example, consider the following selector: :not(a:link) In Gecko, the character where we fail is the second ':', I think. Which selector is failing exactly? And while this case can be handed with the "back up to the preceding '(' and then to the preceding ':'" suggestion, what about: :not|test| ? Here the failing character is the first '|' but there is just no way to extract a valid selector out of the whole thing. Worse yet, what about: :note in one of the UAs supporting :not? Is the failing character the ':' or the 'e'? I realize that for any particular case like this it's clear what should be happening. What I question is that we can easily create a rigorous definition of where the error character pointer should point that works across all the various existing selectors and does reasonable things for currently-invalid selectors. That doesn't even include future-proofing issues, more on which below. > The critical issue, right now, is that there is no way to do defensive, unobtrusive, testing of > a browser's querySelectorAll implementation - it's a complete black > box. Well.. The thing is, selectors are _very_ context-sensitive. That is, whether a selector string is valid can easily change if characters are added to the beginning, end, or anywhere in the middle. It's easy to make a valid string into an invalid one by removing characters at beginning, end, or middle. So really, the only question that can consistently be asked and answered is "is this a valid selector from the point of view of this UA?" At least as far as I can tell... > It's also not sufficient to provide the sequence of characters that > are valid - since that still leaves us with a "black box" problem. If > all you say is that "something inside the :not() isn't valid" then > that doesn't help - we're back to where we started. See the example above: :not(a:link). In CSS3 Selectors, what's invalid is the concept of putting multiple simple selectors inside :not. All the parts are valid on their own; it's the way of putting them together that's invalid. But maybe I'm misunderstanding your real concern here. Above you say that :not would be handled by backing out of it anyway, which makes sense to me. At that point, why do you care what inside the :not is invalid, exactly? You're going to have to do the entire :not match yourself anyway.... > That's not really an issue - there isn't a single, publicly > available, selector engine that queries in that manner. They all work > from left-to-right (finding divs, then finding spans). I assume you mean the JS library ones, right? That makes sense, and does make the combinator thing less of a problem. > That's not really an issue, either, look at the following: > > div, :bad, span > > Most JavaScript libraries, when they see the ',' interpret it to mean > something like "take what we already have and push it on a stack for > later retrieval" - that way when we hit an exception with :bad it'll > really just be like handling any other selector. div, :future(a, b, c), :bad, span How is this handled in a UA that supports :future? How is it handled in a UA that does not? Just splitting on ',' is not quite right.... For that matter, how would you handle: div, :not( , span ? Just splitting on the ',' gives very different results from what querySelectorAll() would return (which is an exception). > This really must be done *now* before implementations get too baked. Maybe I'm missing something. Adding more error-reporting to the thrown exception is a backwards-compatible change to an implementation. If, say, Firefox 4 ships querySelectorAll() without such error reporting, that would not preclude Firefox 5 adding it. > The fact that there's no way to determine what a useragent is capable > of supporting (only through the crude "try it and see if it fails" > technique) means that a new querySelectorAll will have to be > performed on *every single selector call* just to see if it works or > not. There is no way to say "Oh, hey, Mozilla doesn't like :hidden we > should save the overhead of calling that every time." I'm not sure I follow. You could cache the fact that the selector ":hidden" is not supported. That wouldn't help you if someone then used ":hidden" as part of another selector, but unless you pre-parse the selectors all the time, you wouldn't detect that anyway. If you do pre-parse them, what's the overhead of that? Maybe I'm just misunderstanding what information you want and what you want to do with it... More importantly, how does the overhead we're trying to save compare to the resources the operation we're performing consumes? I just did a quick test, and on a 2-year-old MacBook (not Pro), a querySelectorAll(":hidden") in Gecko inside a try/catch with a counter increment in the catch takes about 46 microseconds. Here's the code: const kMaxCount = 10000; function func1(sel) { var start = new Date(); var j = 0; for (var i = 0; i < kMaxCount; ++i) { try { var list = document.querySelectorAll(sel); } catch (e) { ++j; } } var end = new Date(); alert((end - start) + "ms to try/catch " + j + " times"); } func1(":hidden"); 20% of that time is taken up by CSS error reporting, which we should consider disabling for the querySelectorAll case. What's the time it usually takes :hidden to actually match in jquery in similar circumstances? > Not to mention the fact that libraries are going to need to try and > use querySelectorAll for as many queries as possible (or for as much > of the query, as possible, if there's something bad in it). Agreed on the former. I'm just saying that it might be that the latter is hard to do for the libraries, a pain for the UAs, and not much of a performance win. I could be wrong, and then we need some very careful spec text that would allow one to actually determine how much "as much of the query as possible" is. That'll be a lot of work (== time) for the spec author and the question is whether the spec should be held for that or whether it can be added in a later revision. Right now we have two more-or-less interoperable implementations of the specification as written (one in beta, one released), with at least one more on the way soon. Other implementors may be waiting for the spec to go to CR before implementing. I certainly would have if it had seemed like major changes were going to still happen to the specification. So the real question I have is whether it's better to have querySelectorAll without the extra error reporting in UAs 6 months from now and then add error reporting another 12 months after that or whether it's better to have querySelectorAll with extra error reporting in UAs but not until 12 months from now (and nothing before that). It seems to me that the former is better, but note that the numbers involved are pretty fuzzy at best. > div span > a[href]:hidden > > With the extra index all of the leading "div span > a[href]" could be > lopped off and re-run without a hitch - and then the extra :hidden > could be handled by the library. However, as it stands now, the only > thing that we can do is say "Oh well, I'm not sure what went wrong - > I guess we'll do it the slow way." Right. I agree that your proposal works in a number of cases. The problem is making sure it works in all cases, and possibly to make sure it continues to work as selectors are added. > Why this is of such great concern to me is that "invalid selectors" > (ones provided by libraries) are actually used very frequently. For > example, here's a break-down of the most common selectors used by > some popular jQuery sites: http://ejohn.org/files/selectors.html Excellent. Data! Looking at this data, the top four red selectors would be quite fast to redo by hand, since they would just involve getElementById calls. Note that such calls are actually _faster_ than querySelectorAll on the #id selector, since the latter has to walk the whole DOM every time. All the character index would save you in this case is finding the start of the pseudo-class. The next three are a bare red pseudo-class. They would be done completely by hand even with the indexing proposal. The next two are "tag:red-pseudo-class". Doing it by hand means a getElementsByTagName() (which is about as fast, if not faster, as a querySelectorAll() on the tagname, I suspect), followed by a filter on the red pseudo-class. This is very similar to the #id case above. After that you start getting into some selectors where you might get more savings (I'm looking at ".class .class tag:gt(2)"), but these are not that common on an absolute basis compared to the more-common things discussed above, and there are lots of red things with similar frequencies to ".class .class tag:gt(2)" that fall into one of the categories above. Again, I agree there are cases where the index would help. I'm just wondering whether they're common enough, and whether the help is noticeable in those cases in terms of performance, and whether it's easy to tell those cases apart from the cases when the index just doesn't help much. > Libraries would, again, be completely stuck for a way to work around those > issues. You've said this repeatedly, but in the end the worst-case scenario for a library is that it falls back on exactly what it does now after trying querySelectorAll and seeing that it fails. That's not the same as "completely stuck", by a long shot.... -Boris
Received on Thursday, 1 May 2008 17:52:44 UTC