Re: [css3-selectors] No way to select preceding sibling element from Tab Atkins Jr. on 2009-07-13 (www-style@w3.org from July 2009)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 13 Jul 2009 18:25:08 -0500
To: Paul <paul@scriptfusion.com>
Cc: www-style@w3.org
Message-ID: <dd0fbad0907131625q45acaf7cq9475591cca135dee@mail.gmail.com>
On Mon, Jul 13, 2009 at 12:07 PM, Paul<paul@scriptfusion.com> wrote:
> As the specification states for adjacent siblings:
>
> E + F selects an F element immediately preceded by an E element.
> However, there is no way to select an E immediately proceeded by an F
> element and apply styles to element E rather than F.
>
> Similarly, for general siblings:
> E ~ F selects an F element preceded by an E element.
> Again, there is no way to select element an E element proceeded by an F
> element and apply styles to E rather than F.
>
> I see this as a deficiency of the selector specification.

Short Answer: It's too expensive to implement.

Long Answer (and I mean *long*):

To understand why this is so expensive, one must first understand some
of the particulars of browsers and their selector parsing and
matching.

We humans read and write selectors left->right, from highest/earliest
in the document to lowest/latest.  However, browsers do the reverse
and read them right->left.

Why do they do this?  Well, it's been experimentally shown that this
is faster in general.  The jQuery javascript library written by John
Resig originally used left->right matching in its Selector
implementation.  However, while Resig was writing the new Sizzle
selector engine, he did some perf tests on matching right->left
instead, and found it to be significantly faster.  To understand why,
take as an example the markup "<a><a><b></b></a></a>" and the selector
"a b".  A left->right parser first looks for all the "a" elements, and
finds two of them.  It then looks for "b" elements that are
descendants of each of the previously matched set.  It again finds
two, one for each "a" previously matched.  It now has to make sure
that all of the "b" elements are unique, so that it doesn't return
multiple matches for the same element.  Uniqueness testing is
relatively expensive.  On the other hand, a right->left parser first
grabs all the "b" elements (there are one), and then verifies that
they have an "a" ancestor (it does).  At worst it has to delete an
element that fails one of these tests, but it never has to compare its
matched elements together.

That's in general.  When you look at actual usage, it becomes even
more dramatic.  The vast majority of selector matching that occurs in
a browser happens during page load.  As the browser receives the
document over the wire, the HTML parser reads it in order and adds
nodes to the DOM.  As each new node gets added, the browser's selector
engine checks to see if any of the CSS selectors match the node.  When
read right->left, after all, the basic combinators are asking you to
check on the previous siblings or ancestors of the node.  Due to the
way parsing works, these are guaranteed to already be in the
partially-constructed DOM!  Thus most selectors can be *immediately*
matched without waiting for future nodes to be added.  This is what
enables incremental rendering, where the browser displays an
incomplete page and adds to it as the page loads.  While an
incrementally-rendered page can jump around a bit, for the most part
the display is stable due to the previously mentioned properties of
CSS.

So, all selector engines parse "backwards", and I'm claiming that the
current combinators allow this to be done efficiently.  What would
happen, though, if we allowed the reverse versions of the combinators?

For example, given the selector "h1 + div" we can check each <div>
element as it gets added to see if its previous sibling is an <h1>.
This is guaranteed to be testable as soon as the <div> gets added.
However, if we target the <h1> with a selector like "div
[previousAdjacentSibling] h1", then we can't tell whether or not this
matches when the <h1> gets added.  We have to wait until the <h1>s
next sibling gets parsed and added to the DOM, which won't happen
until the <h1>'s children get parsed and added too.  If we've been
incrementally rendering the page, we might have already displayed the
<h1> with the rules we could match immediately.  This new rule may
significantly change the rendering of the <h1> (such as by switching
it from block to inline), which requires us to do a complete re-layout
of the <h1> and its children (and maybe its previous sibling+children,
and arbitrary ancestors, too).  This means you'll get an annoying
flicker where your page first looks like one thing, then another.

So, a previous adjacent sibling combinator is slower to match in the
most common case (when the page is being built node-by-node at
load-time) and can cause expensive and ugly re-layout in the document
if the browser is doing incremental rendering.  The reverse of ~, >,
and " " are even worse.

Selector engines working on a static DOM where everything is fully
loaded (like jQuery's selector engine) can implement the reverse
combinators without much trouble (in jQuery they're done with the
:has() pseudoclass, which is used like "h1:has(+ div)".  However, when
the DOM is allowed to dynamically change, as it does during page-load,
the reverse combinators can cause very significant slowdowns and ugly
effects.  Thus they're not in CSS currently, and likely won't appear
for some time.


All that being said, it has been noted by some browser implementors
that *some* of the reverse combinators (reverse > and +) are possibly
okay enough to be implementable.  These might make it into a future
specification.

~TJ
Received on Monday, 13 July 2009 23:26:08 UTC