Rendering U+2028 LINE SEPARATOR as a forced line break


I'd like to offer a simple proposal: *Render U+2028 LINE SEPARATOR as a
forced line break*.

It seems that the CSS Text Module is the right place for this; please let
me know if I'm mistaken, or if I should be raising this in a different
venue or a different way.  Thanks!

The changes to the CSS Text Module Level 3 draft would be minimal; for

   - In Section 3, append the sentence "U+2028 LINE SEPARATOR is always a
   forced line break."
   - In Section 4.1, exclude U+2028 from the definition of "other space
   - Optionally, add a "U+2028" column to the table in Section 3, with
   "Forced line break" in every row.

The rationale is straightforward:

   - Unicode is very clear about the purpose of U+2028.
   - There are many circumstances in which it is useful to represent
   visible line breaks in text strings without additional markup.
   - There is solid precedent for a character with
   whitespace behaviour that supersedes all the CSS white-space options,
   - The essential layout functionality needed to implement U+2028 as a
   forced line break is not new; browsers already have it if they support
   "white-space: pre-line".
   - Current browsers typically render U+2028 as a visible glyph, such as
   an empty black box.  Many developers find
   <> this
   most likely, it would be less surprising for U+2028 LINE SEPARATOR to be
   rendered as a line separator, as befits its name.

For reference, the Unicode Standard 14.0
<> defines U+2028
LINE SEPARATOR as an "unambiguous separator character".  By my reading, it
could hardly be more clear as to what U+2028 is intended to represent, and
what the most sensible rendering should be:

*5.8 Newline Guidelines*


> *Line Separator and Paragraph Separator*

A paragraph separator—independent of how it is encoded—is used to indicate
> a separation between paragraphs. A line separator indicates where a line
> break alone should occur, typically within a paragraph. [...]  For
> comparison, line separators basically correspond to HTML <BR>, and
> paragraph separators to older usage of HTML <P> (modern HTML delimits
> paragraphs by enclosing them in <P>...</P>).


> *Recommendations*

The Unicode Standard defines two unambiguous separator characters: U+2029
> paragraph separator (PS) and U+2028 line separator (LS). In Unicode text,
> the PS and LS characters should be used wherever the desired function is
> unambiguous.

I'd appreciate hearing your thoughts and suggested next steps on this.

Thanks very much!


Received on Wednesday, 26 January 2022 17:39:41 UTC