Re: [csswg-drafts] [css-text] Render U+2028 LINE SEPARATOR as a forced line break (#6992)

@fantasai Thank you for clarifying this!  I do see now that Section 4.1 did not mean to refer to U+2028 when defining "other space separators".

> CSS3 Text has, technically, required LS to be treated as a forced break for at least a decade. If browsers are not treating it as such, that should be considered a bug against them.

Can this be taken as an official statement on WG's intended interpretation of LS?  I would be delighted to know that treating U+2028 as a forced line break is already the behaviour that CSS Text 3 intends to specify!

I can imagine browser developers not finding this to be obvious from the spec.  Would you suggest pointing them at this comment thread as an authoritative ruling?

Here is why they might find it rather subtle.  CSS Text 3 mentions many other relevant characters by code point (such as U+000A, U+0020, etc.) and name (CARRIAGE RETURN, IDEOGRAPHIC SPACE, etc.).  Yet U+2028 is never mentioned anywhere in the entire spec.  Neither LINE SEPARATOR nor its abbreviation LSEP is mentioned anywhere.  Neither the "Line Separator" category nor its abbreviation "Zl" is mentioned anywhere.  An ordinary person can wonder "I wonder why U+2028 doesn't render as a line break", search for the spec, arrive at CSS Text 3, search the entire document for every imaginable term related to U+2028, and find nothing — indeed, that was my experience, and what led me to file this issue.

Would the CSS editors be willing to consider making this a little more explicit?  I can think of one small change that would clear this all up.

As you pointed out, Section 5.1, bullet point 2 says "lines always break at each preserved forced break character".  But there is no definition for the term "forced break character" in the spec.  If you assume that a "forced break character" has something to do with a "forced line break", then the term "preserved forced break character" is nonsensical: "forced line break" is defined in terms of preserved characters, so there can be no such thing as a non-preserved forced break character.  If you instead start by trying to understand the term "preserved", you find that it is defined only when applied to "preserved white space", wherein the default meaning of "white space" is "document white space", which consists of U+0020, U+0009, and segment breaks; so "preserved" has no meaning when applied to other characters like U+2028.

Fixing this is easy; delete the confusing term and simplify the bullet point to:

> Regardless of the `white-space` value, Unicode characters with the mandatory break property (BK) must be treated as forced line breaks.  This includes U+000C, U+2028, and U+2029 [UAX#14].

(I am omitting VT and NEL here because UAX#14 says implementations are not required to support VT or NEL.)

-- 
GitHub Notification of comment by zestyping
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6992#issuecomment-1024073985 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 28 January 2022 10:17:46 UTC