Re: [selectors4] :blank, ::first-letter, and what a whitespace character is

On Mon, Aug 24, 2015 at 12:22 PM, L. David Baron <dbaron@dbaron.org> wrote:
> I was looking at the definition of :blank in
> https://drafts.csswg.org/selectors-4/#the-blank-pseudo and noticed
> that its definition varies very slightly from :-moz-only-whitespace
> that we already use internally.  In particular, its definition of
> the codepoints that are skipped because they're whitespace points to
> https://drafts.csswg.org/css-text/#white-space-rules , which
> essentially says that we skip spaces (U+0020), tabs (U+0009),
> carriage returns (U+000D) and line feeds (U+000A).
>
> Gecko's implementation of :-moz-only-whitespace currently skips
> these characters and also skips form feed (U+000C).  Additionally
> (and this is the more interesting part), it relies on common code
> for "is this string entirely whitespace" [1], that I'd like the Web
> platform to avoid having a pile of subtly-different variants of.
>
> I went through other callers of this code to look for other things
> exposed to the Web platform, and found one obvious one, which is
> what text we skip over when looking for a ::first-letter
> pseudo-element (an issue that seems to be entirely unspecified in
> selectors level 3).  There might be others, though.  Then I wrote a
> testcase for this, and found a bit of a lack of interop:
> https://lists.w3.org/Archives/Public/www-archive/2015Aug/att-0020/first-letter-form-feed.html
>
> On this testcase (which tests U+0000 to U+001F), I see that:
>  * Gecko (nightly from a week or so ago) skips over the characters I
>    mention above: space, tab, CR, LF, and form feed
>  * Chromium 44 skips over Gecko's set plus also U+000B
>  * IE11 skips over Chromium's set plus also U+001F
>
> I think that:
>
>  (a) we should have interop on the ::first-letter whitespace
>      skipping characters, and it should be specified
>
>  (b) ::blank, or whatever we call it, should use the same definition
>      of whitespace, since I don't want two definitions of "text
>      that's only whitespace" in selectors
>
> It might also be worth a slightly closer examination of what other
> things should have a common behavior with this.  (Might other
> browsers use the same function for ::first-letter and other things
> that are present in Web standards?)

Note that HTML also defines "whitespace" to include U+000C
<https://html.spec.whatwg.org/#space-character>.  In particular, its
definition is U+9, U+A, U+C, U+D, and U+20.

This is also CSS Syntax's definition of whitespace.

Looking over Text, I don't see a good reason to skip U+C either.  I
think Text should be amended to include it.

~TJ

Received on Monday, 24 August 2015 10:42:34 UTC