Re: A Call for ::nth-everything from Charles Pritchard on 2011-11-02 (www-style@w3.org from November 2011)

From: Charles Pritchard <chuck@jumis.com>
Date: Tue, 01 Nov 2011 18:30:00 -0700
To: www-style@w3.org
Message-ID: <4EB09D18.4010709@jumis.com>
> On 11/1/11 6:22 PM, Charles Pritchard wrote:
>> Worst case -- and I'm sure you'll correct me if I'm wrong -- the browser
>> can just create <span> elements inside the shadow dom and use all of the
>> existing optimizations. This still saves authors from having those DOM
>> elements cluttering the public DOM.
>
> You may well be wrong.
>
> In practice, authors tend to not want to style every 3rd letter or 
> whatever in the document.  They actually want to style every third 
> letter in some small section of it.

Yes, in practice, authors are very particular with their css selectors 
often restricting them to a class or an ID.

> Doing that from script is actually easier, because you don't have to, 
> for every dynamic change to the DOM, check whether it happens to be in 
> the region you're interested in.  Especially if you, the author, 
> happen to know that region is completely static.  That's something the 
> browser _never_ knows.

I don't understand the demonstration you're describing.

Is it using substr/slice and join of span and associating a class?

The difficulty I'm having is in roles: I don't know when you're talking 
about an author vs. an implementer when you say "Doing that from script 
is actually easier".

> Let's be realistic here.  We have current browsers that don't even 
> implement the '+' combinator correctly because they think it's too 
> slow to do that....  I'd really like implementors to get that sort of 
> thing fixed before adding more purposefully-broken support for features.

I'm all for being pragmatic. I don't understand why you are labelling 
this as something that would be "purposefully-broken". I'm discussing 
the idea in good faith.

CSS has an amazing replaced/generated content spec. I don't see why 
advanced styles on letters would be a departure from the precedent set 
in generated content.

For speed of rendering, as an author, I'd stick with doing things like 
using transform, and I'd stay away from relative positioning. I 
recognize, as an author, that some CSS styles may mean slow performance 
on the page and some CSS styles are quite fast.

If it would help for me to speak in terms of code blocks or sections 
from Mozilla or WebKit, I could do that.

>
>>>> nth-letter is specified in the same manner as ecmascript substr.
>>>
>>> That's a completely bogus definition once you're out of the BMP.
>>>
>>> Can we please stop defining these some-western-language-only kind of
>>> things?
>>>
>> These are based on byte ranges, not on western-language.
>
> But in practice, all Western languages live in the BMP, so people who 
> only focus on those tend to not care about non-BMP issues (like using 
> the ES definition of "letter", say!).

Well, I used "character" in my projects. I understand and appreciate 
your frustration in this area.

I assure you that I am quite sensitive to language diversity and bias.

While we're throwing out generalizations: I'd say that Mac users care 
more about non-BMP than Windows users.

I suspect that Mobile users are likely to care more about non-BMP as well.

It's been a wonderful thing, that Mobile has been pushing things 
forward, even when they go so far as to used colored glyphs (such as 
emoji in iOS 4.x+). I'm glad to see it.

>
>> I understand
>> UTF8 as well as the ambigious nature of the word "word" and "letter"
>> when applying them universally. There are agglutinative languages where
>> a single word may comprise an entire sentence. There are scripts where
>> "letter" is not so easily defined.
>
> That's actually a completely separate issue from BMP vs non-BMP, and a 
> quite valid one, but it's been brought up already and has nothing to 
> do with the substr() definition of "letter".

I would be using the word "character" with substr, but letter was in the 
nth-everything proposal.

Authors toying with advanced CSS selectors are likely to be 
ECMAScript-literate, and so I thought we could just borrow a semantic 
they already know. I realize the standards in CSS are often more complex 
and considerate than this.


>> I'm very happy to explore those issues, and I'm sure authors using those
>> languages and scripts are aware of the issues from the moment they start
>> programming with ECMAScript and styling with CSS.
>
> The problems come when the author of the ES or CSS is not the author 
> of the content and knows nothing about the issues.
>
That's always going to be a problem when ES or CSS is mis-used.

In those cases, it's surely easier for the user to correct, or try to 
interpret the intention of the author, if the presentation is done in 
CSS as opposed to being handled via ES and presentational markup.

Wouldn't you agree that this is presentational markup:
<div><span>H</span><span>e</span><span>y</span></div>

Whereas this is a bit easier on everyone:
<div>Hey</div>

With the first, I can nth-child, with the second, I could nth-letter.

The first is more likely to need and succumb to ES obfuscation, the 
latter is much easier to accomodate and nullify via custom style sheets 
and other techniques.


>> We can't "stop" definining these from a western-language bias. "en" is
>> the standard fallback in most specs; 7-bit ascii, roman script, in a
>> 1-byte is the most compatible way of working with content. That's just
>> historical cruft.
>
> My point is we should be trying to move away from that insofar as it 
> affects the content, not adding more barriers to writing in whatever 
> language people want to write in.

I didn't think of this proposal as a barrier. However, you have 
convinced me that the proposal does not take multibyte into account, and 
whatever merit the proposal has, it ought to be re-examined. It neglects 
multibyte. I don't think it's a barrier, but I do think it should be 
extended with non-BMP in mind as a use case.


>> If you'd prefer express things in UTF-8, that's fine: WebIDL uses
>> DOMString. I get that. UTF8 has excellent support nowadays.
>
> None of which has to do with my issue with substr)().
>
>> An author working with a script where nth-letter is not
>> functional/relevant is simply not going to use that selector.
>
> I was specifically talking about scripts where the concept of "letter" 
> makes sense (so it's _relevant_), but that don't live in the BMP.  I 
> see no reason, if we do this at all, why we'd by-design make it not 
> _functional_ for them.
Solid reasoning. I think they're neglected as the "letter" concept is 
difficult, whereas a naive 1-byte character concept is straightforward.

I'm convinced. There ought to be something functional for more complex 
glyphs. I'd like to see it as an extension to the proposal rather than a 
rewrite of nth-letter -- though perhaps nth-letter should be renamed to 
nth-character.

I may concede that it's biased to move forward with nth-letter without 
having added anything for non-BMP uses.
Thai is an excellent target to cogitate on.

-Charles
Received on Wednesday, 2 November 2011 01:30:36 UTC