Re: CSS regexes from Joergen W. Lang on 2011-07-21 (www-style@w3.org from July 2011)

From: Joergen W. Lang <joergen_lang@gmx.de>
Date: Fri, 22 Jul 2011 00:59:51 +0200
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: www-style@w3.org
Message-ID: <4E28AF67.8050004@gmx.de>
Tab, thanks for taking the time and to consider my questions.
They were not meant as a proposal. At least not yet.

Anyway - I took some time to think about your response and further 
questions.

> On Wed, Jul 13, 2011 at 11:00 AM, Joergen W.
> Lang<joergen_lang@gmx.de>  wrote:

>> p::regex(/^(?:\w+\s+){2}(\w+)/) {
>>   background-color: #cf6;
>> }
>>
>> Or maybe she wants to give quotation marks a special treatment:
>>
>> h1::regex(/['""„“”«»’,]/g) {
>>   font-family: Baskerville, "Book Antiqua", serif;
>>   font-style:  italic;
>> }

[...]

Am 18.07.11 22:58, schrieb Tab Atkins Jr.:
> Unfortunately, this sort of thing has several problems that make it
> hard to implement.

So does "hard" mean "too hard to even bother" or "hard but not 
impossible"? Unfortunately I am not an implementer. I have a strong 
perlish background and done a fair bit of web programming. Currently I 
am more on the authoring side (web sites and books that is).

> 1. Does it match across element boundaries?  If so, it'll be a lot
> slower.  If not, it's much less useful.

If I imagine using :regex() it would certainly be limited by whatever 
selector it is attached to. If I use

   p:regex(/position|top|left|bottom|right/) {
     color: red;
   }

to do some syntax highlighting on e.g. a style sheet example the regex 
should find all ocurrences of those options -  no matter how deeply 
nested they are within that p element.

This leads to more questions:

* Just how much slower is 'slower' actually?
* Would it be acceptable under certain cicumstances?
* Is there any way to benchmark these things?

> 2. Does it match across textnodes?  Even when the page *looks* like
> it's just continuous text, the text may actually be broken across
> separate textnodes.  This has the same implications as the previous,
> except it's more confusing because you can only tell when a run of
> text is broken into multiple textnodes by examining it from script.

Very likely yes. Yet, limited by the selector to which :regex() was 
attached.

> 3. What happens if two regexes (or two applications of the same regex)
> overlap?  CSS always works with trees, so you'd need some way to
> determine which one gets broken apart, which one is innermost, etc.

Sorry, not sure what you mean by 'overlap'.
'Trying to style the same content'?

Also not sure what you mean by 'broken apart' or 'innermost'.
Could you please explain? Regexes trying to match nested elements? 
Nested regexes?

Generally, I would expect race conditions to be treated by the rules of 
cascade and specificity as much as possible.

If two instances of :regex() try to style the same content that one 
attached to the selector with the highest specificity wins. Regexes 
should not overlap.

This should certainly be determined before the :regex() is actually applied.

Question on the side: What about cases where !important is applied?

> 4. There are performance implications with running regex across the
> entire DOM like that, as you have to rerun all of them any time the
> text content of anything in the page changes.

Why should a regex run against the entire DOM? I do not see that except 
someone would really want to to do something like

   :root:regex(/[something crazily complex]/) { ... }

Thinking of performance issues, these things could maybe help to speed 
up things:

* allow :regex() only on a subset of selectors
* allow only a basic subset of operators in the regex to
   cover the most common use cases
   (do we need backreferences, lookahead, case sensitivity?)
* re-use an already implemented regex engine
   (could this actually be done?
    Even if JS is deactivated by the user?)
* allow only a certain number of regexes in one style sheet
   (including all types of style sheets)
* limit the amount of properties that could be applied
   via a regex

> So, while I think it's a pretty cool idea that would be useful in
> several ways (I wanna style all my ampersands with a pretty font!), I
> don't think it's something that can actually be done.  :/

Hmmm that sounds much more pessimistic than the previous 'hard to 
implement'. If it was only for ampersands I would probably be happy to 
continue inserting <span>s via search and replace in BBedit or whatever.

I was thinking mostly of [syntax] highlighting in

* code examples
* books (Koran, Bible, Talmud, Veda, Tao-te-ching, ...)
* legal texts
* technical documentation
* ...

It would be interesting to get a few more views on this issue,

Jørgen
Received on Thursday, 21 July 2011 23:00:22 UTC