Re: A Call for ::nth-everything from Boris Zbarsky on 2011-11-02 (www-style@w3.org from November 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 01 Nov 2011 20:21:29 -0400
To: Charles Pritchard <chuck@jumis.com>
CC: www-style@w3.org
Message-ID: <4EB08D09.5010209@mit.edu>
On 11/1/11 6:22 PM, Charles Pritchard wrote:
> Worst case -- and I'm sure you'll correct me if I'm wrong -- the browser
> can just create <span> elements inside the shadow dom and use all of the
> existing optimizations. This still saves authors from having those DOM
> elements cluttering the public DOM.

You may well be wrong.

In practice, authors tend to not want to style every 3rd letter or 
whatever in the document.  They actually want to style every third 
letter in some small section of it.

Doing that from script is actually easier, because you don't have to, 
for every dynamic change to the DOM, check whether it happens to be in 
the region you're interested in.  Especially if you, the author, happen 
to know that region is completely static.  That's something the browser 
_never_ knows.

Let's be realistic here.  We have current browsers that don't even 
implement the '+' combinator correctly because they think it's too slow 
to do that....  I'd really like implementors to get that sort of thing 
fixed before adding more purposefully-broken support for features.

>>> nth-letter is specified in the same manner as ecmascript substr.
>>
>> That's a completely bogus definition once you're out of the BMP.
>>
>> Can we please stop defining these some-western-language-only kind of
>> things?
>>
> These are based on byte ranges, not on western-language.

But in practice, all Western languages live in the BMP, so people who 
only focus on those tend to not care about non-BMP issues (like using 
the ES definition of "letter", say!).

> I understand
> UTF8 as well as the ambigious nature of the word "word" and "letter"
> when applying them universally. There are agglutinative languages where
> a single word may comprise an entire sentence. There are scripts where
> "letter" is not so easily defined.

That's actually a completely separate issue from BMP vs non-BMP, and a 
quite valid one, but it's been brought up already and has nothing to do 
with the substr() definition of "letter".

> I'm very happy to explore those issues, and I'm sure authors using those
> languages and scripts are aware of the issues from the moment they start
> programming with ECMAScript and styling with CSS.

The problems come when the author of the ES or CSS is not the author of 
the content and knows nothing about the issues.

> We can't "stop" definining these from a western-language bias. "en" is
> the standard fallback in most specs; 7-bit ascii, roman script, in a
> 1-byte is the most compatible way of working with content. That's just
> historical cruft.

My point is we should be trying to move away from that insofar as it 
affects the content, not adding more barriers to writing in whatever 
language people want to write in.

> If you'd prefer express things in UTF-8, that's fine: WebIDL uses
> DOMString. I get that. UTF8 has excellent support nowadays.

None of which has to do with my issue with substr)().

> An author working with a script where nth-letter is not
> functional/relevant is simply not going to use that selector.

I was specifically talking about scripts where the concept of "letter" 
makes sense (so it's _relevant_), but that don't live in the BMP.  I see 
no reason, if we do this at all, why we'd by-design make it not 
_functional_ for them.

-Boris
Received on Wednesday, 2 November 2011 00:22:10 UTC