Re: CSS regexes from Tab Atkins Jr. on 2011-07-18 (www-style@w3.org from July 2011)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 18 Jul 2011 13:58:25 -0700
To: jwl@worldmusic.de
Cc: www-style@w3.org
Message-ID: <CAAWBYDAGN9hS3Hs0kLNsC567uZZV76Woo+gi4V0pOpWGaDNttw@mail.gmail.com>

On Wed, Jul 13, 2011 at 11:00 AM, Joergen W. Lang <joergen_lang@gmx.de> wrote:
> Dear CSSWG,
>
> are there any plans to implement one or the other form of regular expression
> in CSS?
>
> I know that attribute selectors do some basic RE matching. But for anonymous
> boxes (a.k.a. plain text) authors are still limited to using <span> and
> friends.
>
> To style the third "word" (combination of 'word' characters/syllables/...)
> in a paragraph one could write:
>
> p::regex(/^(?:\w+\s+){2}(\w+)/) {
>  background-color: #cf6;
> }
>
> Or maybe she wants to give quotation marks a special treatment:
>
> h1::regex(/['""„“”«»’,]/g) {
>  font-family: Baskerville, "Book Antiqua", serif;
>  font-style:  italic;
> }
>
> After reading the thread regarding the '::first-word pseudo element' [1] I
> can see there are a lot of implications and questions. One argument against
> ::regex() was that it selects actual content.
>
> IMHO it really depends on how authors use regexes. Yes, they could style
> actual content - or they could style things like "every instance of the
> letter 'm'" or "the first instance of the word 'Foobar'" or "the last
> appearance of [almost anything]".
>
> I suppose that most browsers do some sort of pattern matching during the
> parsing process. Maybe an already existing RE engine could be reused for
> this purpose?
>
> Please excuse my intrusion if this has already been discussed and decided
> upon. Hopefully I'm not (re)opening a proverbial Pandora's box here.

Unfortunately, this sort of thing has several problems that make it
hard to implement.

1. Does it match across element boundaries?  If so, it'll be a lot
slower.  If not, it's much less useful.

2. Does it match across textnodes?  Even when the page *looks* like
it's just continuous text, the text may actually be broken across
separate textnodes.  This has the same implications as the previous,
except it's more confusing because you can only tell when a run of
text is broken into multiple textnodes by examining it from script.

3. What happens if two regexes (or two applications of the same regex)
overlap?  CSS always works with trees, so you'd need some way to
determine which one gets broken apart, which one is innermost, etc.

4. There are performance implications with running regex across the
entire DOM like that, as you have to rerun all of them any time the
text content of anything in the page changes.

So, while I think it's a pretty cool idea that would be useful in
several ways (I wanna style all my ampersands with a pretty font!), I
don't think it's something that can actually be done.  :/

~TJ

Received on Monday, 18 July 2011 20:59:12 UTC