[selectors4] :blank, ::first-letter, and what a whitespace character is

I was looking at the definition of :blank in
https://drafts.csswg.org/selectors-4/#the-blank-pseudo and noticed
that its definition varies very slightly from :-moz-only-whitespace
that we already use internally.  In particular, its definition of
the codepoints that are skipped because they're whitespace points to
https://drafts.csswg.org/css-text/#white-space-rules , which
essentially says that we skip spaces (U+0020), tabs (U+0009),
carriage returns (U+000D) and line feeds (U+000A).

Gecko's implementation of :-moz-only-whitespace currently skips
these characters and also skips form feed (U+000C).  Additionally
(and this is the more interesting part), it relies on common code
for "is this string entirely whitespace" [1], that I'd like the Web
platform to avoid having a pile of subtly-different variants of.

I went through other callers of this code to look for other things
exposed to the Web platform, and found one obvious one, which is
what text we skip over when looking for a ::first-letter
pseudo-element (an issue that seems to be entirely unspecified in
selectors level 3).  There might be others, though.  Then I wrote a
testcase for this, and found a bit of a lack of interop:
https://lists.w3.org/Archives/Public/www-archive/2015Aug/att-0020/first-letter-form-feed.html

On this testcase (which tests U+0000 to U+001F), I see that:
 * Gecko (nightly from a week or so ago) skips over the characters I
   mention above: space, tab, CR, LF, and form feed
 * Chromium 44 skips over Gecko's set plus also U+000B
 * IE11 skips over Chromium's set plus also U+001F

I think that:

 (a) we should have interop on the ::first-letter whitespace
     skipping characters, and it should be specified

 (b) ::blank, or whatever we call it, should use the same definition
     of whitespace, since I don't want two definitions of "text
     that's only whitespace" in selectors

It might also be worth a slightly closer examination of what other
things should have a common behavior with this.  (Might other
browsers use the same function for ::first-letter and other things
that are present in Web standards?)

-David

[1] https://mxr.mozilla.org/mozilla-central/search?string=isspacecharacter
    https://mxr.mozilla.org/mozilla-central/search?string=textisonlywhitespace

-- 
𝄞   L. David Baron                         http://dbaron.org/   𝄂
𝄢   Mozilla                          https://www.mozilla.org/   𝄂
             Before I built a wall I'd ask to know
             What I was walling in or walling out,
             And to whom I was like to give offense.
               - Robert Frost, Mending Wall (1914)

Received on Monday, 24 August 2015 10:23:12 UTC