Re: [selectors4] :blank, ::first-letter, and what a whitespace character is

> On Aug 24, 2015, at 3:41 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote:
> 
>> On Mon, Aug 24, 2015 at 12:22 PM, L. David Baron <dbaron@dbaron.org> wrote:
>> I was looking at the definition of :blank in
>> https://drafts.csswg.org/selectors-4/#the-blank-pseudo and noticed
>> that its definition varies very slightly from :-moz-only-whitespace
>> that we already use internally.  In particular, its definition of
>> the codepoints that are skipped because they're whitespace points to
>> https://drafts.csswg.org/css-text/#white-space-rules , which
>> essentially says that we skip spaces (U+0020), tabs (U+0009),
>> carriage returns (U+000D) and line feeds (U+000A).
>> 
>> Gecko's implementation of :-moz-only-whitespace currently skips
>> these characters and also skips form feed (U+000C).  Additionally
>> (and this is the more interesting part), it relies on common code
>> for "is this string entirely whitespace" [1], that I'd like the Web
>> platform to avoid having a pile of subtly-different variants of.
>> 
>> I went through other callers of this code to look for other things
>> exposed to the Web platform, and found one obvious one, which is
>> what text we skip over when looking for a ::first-letter
>> pseudo-element (an issue that seems to be entirely unspecified in
>> selectors level 3).  There might be others, though.  Then I wrote a
>> testcase for this, and found a bit of a lack of interop:
>> https://lists.w3.org/Archives/Public/www-archive/2015Aug/att-0020/first-letter-form-feed.html
>> 
>> On this testcase (which tests U+0000 to U+001F), I see that:
>> * Gecko (nightly from a week or so ago) skips over the characters I
>>   mention above: space, tab, CR, LF, and form feed
>> * Chromium 44 skips over Gecko's set plus also U+000B
>> * IE11 skips over Chromium's set plus also U+001F
>> 
>> I think that:
>> 
>> (a) we should have interop on the ::first-letter whitespace
>>     skipping characters, and it should be specified
>> 
>> (b) ::blank, or whatever we call it, should use the same definition
>>     of whitespace, since I don't want two definitions of "text
>>     that's only whitespace" in selectors
>> 
>> It might also be worth a slightly closer examination of what other
>> things should have a common behavior with this.  (Might other
>> browsers use the same function for ::first-letter and other things
>> that are present in Web standards?)
> 
> Note that HTML also defines "whitespace" to include U+000C
> <https://html.spec.whatwg.org/#space-character>.  In particular, its
> definition is U+9, U+A, U+C, U+D, and U+20.
> 
> This is also CSS Syntax's definition of whitespace.
> 
> Looking over Text, I don't see a good reason to skip U+C either.  I
> think Text should be amended to include it.

I don't know what those are. I'm used to more digits in my Unicode. Is one of them a non-breaking space? I'd really like to be able to treat nbsp's as regular white space for :blank (or :empty-plus, or whatever). I don't know if it ever has semantic meaning, but  usually it is in the markup because the HTML author assumed some sort of layout that I, the CSS author, don't necessarily what to follow. So I'd either want to collapse it away (a separate topic), or select it with :empty-ish in order to hide it. 

I would think things like thin-space and discretionary hyphens would also be on the list. 

Received on Tuesday, 25 August 2015 14:32:51 UTC