[csswg-drafts] [css-text] Render U+2028 LINE SEPARATOR as a forced line break (#6992)

tabatkins has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text] Render U+2028 LINE SEPARATOR as a forced line break ==
**[Originally posted by Ka-Ping Yee](https://lists.w3.org/Archives/Public/www-style/2022Jan/0013.html)**

I'd like to offer a simple proposal: Render U+2028 LINE SEPARATOR as a forced line break.

It seems that the CSS Text Module is the right place for this; please let me know if I'm mistaken, or if I should be raising this in a different venue or a different way.  Thanks!

The changes to the CSS Text Module Level 3 draft would be minimal; for example:

* In Section 3, append the sentence "U+2028 LINE SEPARATOR is always a forced line break."
* In Section 4.1, exclude U+2028 from the definition of "other space separators.."
* Optionally, add a "U+2028" column to the table in Section 3, with "Forced line break" in every row.

The rationale is straightforward:

* Unicode is very clear about the purpose of U+2028.
* There are many circumstances in which it is useful to represent visible line breaks in text strings without additional markup.
* There is solid precedent for a character with whitespace behaviour that supersedes all the CSS white-space options, U+00A0 NO-BREAK SPACE.
* The essential layout functionality needed to implement U+2028 as a forced line break is not new; browsers already have it if they support "white-space: pre-line".
* Current browsers typically render U+2028 as a visible glyph, such as an empty black box.  Many developers find this surprising; most likely, it would be less surprising for U+2028 LINE SEPARATOR to be rendered as a line separator, as befits its name.

For reference, the Unicode Standard 14.0 defines U+2028 LINE SEPARATOR as an "unambiguous separator character".  By my reading, it could hardly be more clear as to what U+2028 is intended to represent, and what the most sensible rendering should be:

> 5.8 Newline Guidelines 
> [....]
> Line Separator and Paragraph Separator
>  
> A paragraph separator—independent of how it is encoded—is used to indicate a separation between paragraphs. A line separator indicates where a line break alone should occur, typically within a paragraph. [...]  For comparison, line separators basically correspond to HTML <BR>, and paragraph separators to older usage of HTML <P> (modern HTML delimits paragraphs by enclosing them in <P>...</P>).
> [...]
> Recommendations 
>  
> The Unicode Standard defines two unambiguous separator characters: U+2029 paragraph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous.
> 

I'd appreciate hearing your thoughts and suggested next steps on this.

Thanks very much! 

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6992 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 26 January 2022 18:28:22 UTC