Rendering U+2028 LINE SEPARATOR as a forced line break

Hello!

I'd like to offer a simple proposal: *Render U+2028 LINE SEPARATOR as a
forced line break*.

It seems that the CSS Text Module is the right place for this; please let
me know if I'm mistaken, or if I should be raising this in a different
venue or a different way.  Thanks!

The changes to the CSS Text Module Level 3 draft would be minimal; for
example:

   - In Section 3, append the sentence "U+2028 LINE SEPARATOR is always a
   forced line break."
   - In Section 4.1, exclude U+2028 from the definition of "other space
   separators."
   - Optionally, add a "U+2028" column to the table in Section 3, with
   "Forced line break" in every row.

The rationale is straightforward:

   - Unicode is very clear about the purpose of U+2028.
   - There are many circumstances in which it is useful to represent
   visible line breaks in text strings without additional markup.
   - There is solid precedent for a character with
   whitespace behaviour that supersedes all the CSS white-space options,
   U+00A0 NO-BREAK SPACE.
   - The essential layout functionality needed to implement U+2028 as a
   forced line break is not new; browsers already have it if they support
   "white-space: pre-line".
   - Current browsers typically render U+2028 as a visible glyph, such as
   an empty black box.  Many developers find
   <https://bugs.chromium.org/p/chromium/issues/detail?id=550275> this
   <https://stackoverflow.com/questions/39603446/why-is-this-lsep-symbol-showing-up-on-chrome-and-not-firefox-or-edge>
   surprising
   <https://stackoverflow.com/questions/41555397/strange-symbol-shows-up-on-website-l-sep>;
   most likely, it would be less surprising for U+2028 LINE SEPARATOR to be
   rendered as a line separator, as befits its name.

For reference, the Unicode Standard 14.0
<https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf> defines U+2028
LINE SEPARATOR as an "unambiguous separator character".  By my reading, it
could hardly be more clear as to what U+2028 is intended to represent, and
what the most sensible rendering should be:

*5.8 Newline Guidelines*

[...]

> *Line Separator and Paragraph Separator*



A paragraph separator—independent of how it is encoded—is used to indicate
> a separation between paragraphs. A line separator indicates where a line
> break alone should occur, typically within a paragraph. [...]  For
> comparison, line separators basically correspond to HTML <BR>, and
> paragraph separators to older usage of HTML <P> (modern HTML delimits
> paragraphs by enclosing them in <P>...</P>).

[...]

> *Recommendations*



The Unicode Standard defines two unambiguous separator characters: U+2029
> paragraph separator (PS) and U+2028 line separator (LS). In Unicode text,
> the PS and LS characters should be used wherever the desired function is
> unambiguous.


I'd appreciate hearing your thoughts and suggested next steps on this.

Thanks very much!


—Ping

Received on Wednesday, 26 January 2022 17:39:41 UTC