Re: [csswg-drafts] [selectors] What a whitespace character is (#3754)

* > Currently the definition of :empty references “document white space”. Form feeds do not currently qualify as “document white space”, and neither do carriage returns. They are to be rendered as visible control characters.

    That seems like a feature to me: if an element contains things that are visible control characters not, then it seems good that it would not be considered `:empty`. regardless of whether we include 0x0C or 0x0D in "document white space, the fact that css-text and `:empty` are aligned seems good.
* `::first-letter` isn't defined in terms of white space at all, it is defined in terms of "the first typographic letter unit", plus any preceeding "characters that belong to the Punctuation (P*) Unicode general category". This implies that it will skip over more than just white space (regardless of definition), but also control characters, symbols and what not. Maybe we don't have enough tests, or maybe we don't have full interop, but we do seem to have a precise and sensible definition.
* what the css parser (or the JS parser, for that matter) considers white space seems mostly irrelevant to anything else. It might align with other definitions, and it would be convenient for learnability if it did, but I don't care strongly, and compat probably sets in stone whatever we arrived at. (Note: for the css-parser, it is equivalent to the infra spec's ASCII white space with newline normalization; for JS it is [its own beast](http://www.ecma-international.org/ecma-262/9.0/index.html#prod-WhiteSpace))
* How the HTML parser handles white space characters and sets up the DOM is well defined. It starts off with the infra spec's notion of ASCII white space and how to normalize new lines, but it is also somewhat context sensitive. It's also unlikely to change, due to compat.

So, from the point of view of  having definitions and using them sensibly, I think we're good.

If we want to reduce that number of definitions, What we might want to do is to reopen https://github.com/w3c/csswg-drafts/issues/855 and to stop treating CR and FF as control characters, and start including them in document white-space instead, along with LF (and therefore make them invisible, collapse them, allow them in `:empty`...), which would allow us to align css-text's "document white space" with the infra spec's “ASCII white space”. I wouldn't have a strong objection to doing that, but I am also unconvinced it is useful.

On the other hand, the fact that this [test/demo](https://jsbin.com/bupaqovewu/edit?html,css,js,output) gives 3 different results in Chrome, Firefox and Safari is sad. Maybe we should look at how various kinds of line breaks are (or aren't) normalized when inserting content via the `content` property or via javascript.

-- 
GitHub Notification of comment by frivoal
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3754#issuecomment-493297305 using your GitHub account

Received on Friday, 17 May 2019 02:35:17 UTC