Re: [csswg-drafts] [css-text] Render U+2028 LINE SEPARATOR as a forced line break (#6992)

> Can this be taken as an official statement on the WG's intended interpretation of LS? I would be delighted to know that treating U+2028 as a forced line break is already the behaviour that CSS Text 3 intends to specify!

I'd agree with that interpretation. css-text-3 states that:
> or the BK and NL Unicode line breaking classes must be honored. [UAX14]

UAX14 States that 2028 has non-tailorable BK class, and that “The text after [it] starts at the beginning of the line”.

There's a level of indirection, which may make it non obvious on a casual read, but I think it's unambiguous that this is the expected behavior.

> CSS Text 3 mentions many other relevant characters by code point (such as U+000A, U+0020, etc.) and name (CARRIAGE RETURN, IDEOGRAPHIC SPACE, etc.). Yet U+2028 is never mentioned anywhere in the entire spec

css-text-3 mentions those characters where special css-specific processing going beyond (or against) Unicode is needed. For the rest, as stated in [1.5](https://www.w3.org/TR/css-text-3/#text-encoding), “CSS is built on Unicode. UAs […] must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS.” So css-text-3 cannot be implemented correctly without referencing Unicode (and in particular UAX14), which in the case of U+2028, gives us a definitive normative answer.

That said, if an editorial chance can make this clearer, I'd be happy to take that on.

> Fixing this is easy; delete the confusing term and simplify the bullet point to:
> > Regardless of the white-space value, Unicode characters with the mandatory break property (BK) must be treated as forced line breaks. This includes U+000C, U+2028, and U+2029. [UAX14]

I don't think this quite works. That covers the BK class, but leaves off preserved segments breaks (U+000A).

Also
> I am omitting VT and NEL here because UAX#14 says "implementations are not required to support…

I am interpreting css-text-3 to be going beyond Unicode here, removing the optionality, and adding a requirement that this be supported for the sake of interoperability, so I'd rather keep it.

How about

> Preserved segment breaks, and—regardless of the `white-space` value—any Unicode character with the BK or LN line breaking class, must be treated as forced line breaks. [UAX14]
> Note: As of Unicode 14, the BK and NL classes include U+000B, U+000C, U+0085, U+2028, and U+2029.

-- 
GitHub Notification of comment by frivoal
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6992#issuecomment-1025329457 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 31 January 2022 02:52:11 UTC