Re: [csswg-drafts] [css-text] Render U+2028 LINE SEPARATOR as a forced line break (#6992)

Thanks, @tabatkins!

I can't edit the issue description directly, but here it is with the markup fixed up to render correctly on GitHub:

———

I'd like to propose that U+2028 be rendered as a forced line break.

The changes to the CSS Text Module Level 3 draft would be minimal; for example:
- In Section 3, append the sentence "U+2028 LINE SEPARATOR is always a forced line break."
- In Section 4.1, exclude U+2028 from the definition of "other space separators."
- Optionally, add a "U+2028" column to the table in Section 3, with "Forced line break" in every row.

The rationale is straightforward:
- Unicode is very clear about the purpose of U+2028.
- There are many circumstances in which it is useful to represent visible line breaks in text strings without additional markup.
- There is solid precedent for a character with whitespace behaviour that supersedes all the CSS white-space options, U+00A0 NO-BREAK SPACE.
- The essential layout functionality needed to implement U+2028 as a forced line break is not new; browsers already have it if they support "white-space: pre-line".
- Current browsers typically render U+2028 as a visible glyph, such as an empty black box.  Many developers [find](https://bugs.chromium.org/p/chromium/issues/detail?id=550275) [this](https://stackoverflow.com/questions/39603446/why-is-this-lsep-symbol-showing-up-on-chrome-and-not-firefox-or-edge) [surprising](https://stackoverflow.com/questions/41555397/strange-symbol-shows-up-on-website-l-sep); most likely, it would be less surprising for U+2028 LINE SEPARATOR to be rendered as a line separator, as befits its name.

For reference, the [Unicode Standard 14.0](https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf) defines U+2028 LINE SEPARATOR as an "unambiguous separator character".  By my reading, it could hardly be more clear as to what U+2028 is intended to represent, and what the most sensible rendering should be:

> ### 5.8 Newline Guidelines 
[...]
> #### Line Separator and Paragraph Separator
> 
> A paragraph separator—independent of how it is encoded—is used to indicate a separation between paragraphs. A line separator indicates where a line break alone should occur, typically within a paragraph. [...]  For comparison, line separators basically correspond to HTML `<BR>`, and paragraph separators to older usage of HTML `<P>` (modern HTML delimits paragraphs by enclosing them in `<P>`...`</P>`).

[...]
> #### Recommendations 
>
> The Unicode Standard defines two unambiguous separator characters: U+2029 paragraph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous.


-- 
GitHub Notification of comment by zestyping
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6992#issuecomment-1022575596 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 26 January 2022 20:28:38 UTC