Re: [csswg-drafts] [css-text-3] line-break, word-break: language unclear, and a new testcase. (#2559)

> First, I expect I am not the first to point out that "word-break" and "line-break" have some considerable overlap. As described, breaks within words like ちょっと (UAX14 classes ID CJ CJ ID) are covered by the line-break rule, although this is a single word. And of course, "line-break: anywhere" will break words. Some sort of clarifying note as to the interaction of these two features might help.

I've tried to clarify the specific interactions. Not sure exactly how to explain the interactions other than what's there, but I'll give it a try later.

> word-break states it "controls whether a soft wrap opportunity exists between adjacent typographic letter units (or other typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes" - although the note at the bottom of "keep-all" explicitly mentions Korean, the classes H2, H3, JL, JT and JV are excluded from this list. I don't know Korean so I'm unsure if that is a deliberate omission. It also doesn't mention classes CJ or NS, and again I'm not sure if this is a deliberate omission. Given the overlap with line-break it may be better to dump this descriptive paragraph completely in favour of exact descriptions of the behaviour of each property with regard to UAX14, as I've added below.

H2, H3, JL, JT, JV, and CJ are excluded from that list because they are all letters, so they're included in “typographic letter units” already. Line breaking around NS is controlled by `line-break`: `word-break` is not able to influence it. I've changed “other” to “non-letter” here to clarify. It's a bit awkward because I don't know how to grammatically construct the sentence to make it clear that the “belonging to” phrase attaches only to “typographic character units” and not to “typographic letter units”, hence the parentheses. Anyway the sentence now looks like

“Specifically it controls whether a soft wrap opportunity exists between adjacent typographic letter units (and/or non-letter typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes [UAX14]).”

> The language of "word-break: keep-all" is still a bit unclear with regards to the changes it mandates to UAX14. For example, "Breaking is forbidden within “words”: implicit soft wrap opportunities between typographic letter units are suppressed" makes no mention of character class, so isn't much help if you're implementing this. 

“typographic letter unit” is very specifically defined in https://www.w3.org/TR/css-text-3/#typographic-letter-unit so I don't know why you think there's “no mention of character class”.

> I believe the intention here is to treat all ideographic characters as if they were latin text.

Yes.

> line-break: anywhere is described as providing "a soft wrap opportunity around every typographic character unit, including around any punctuation character or preserved spaces, or in the middle of words, disregarding any prohibition against line breaks introduced by characters with the GL, JW, or ZJW character class". It then states in the note that "This value triggers the line breaking rules typically seen in terminals.". If that's the intention then the mention of GL, JW and ZJW (which should be WJ and ZWJ by the way) is superfluous and confusing. And also superfluous. The final sentence should be "disregarding any prohibition", full-stop end of. Literally anywhere in the text is a valid break-point, even before U+20

Edited as “any prohibition against line breaks<ins>, even those</ins> introduced by characters ...”. I want it to be clear that explicit wrapping controls are also ignored.

> What happens if I specify "word-break: keep-all; line-break: anywhere". The two rules contradict each other; which one wins?

`line-break: anywhere`. I'll clarify that point.

> Using the language of the text as an input to the algorithm seems a bit odd to me. Is there any reason "loose-cj" and "normal-cj" values for line-break could not be used to achieve the same thing? Not really a serious issue and I can't think of a specific reason why it's a problem, it just feels out of character with the rest of the spec so thought I'd raise it while I'm typing.

There's a lot of stuff in the spec that is language- or writing-system-dependent. Much of it is not called out in such explicit terms as these rules, but line-breaking, justification, white-space collapsing, and text transforms are all language-dependent. We do this because a) we want things to work optimally by default, without the author having to think about every single CSS property that does or will exist b) we want to keep the number of values limited to what switches are useful for an author to think about rather than overloading everyone in the world with more values than they can easily reason about or even need to know about.

> (note: existing description states "customary rules as described above", which is nowhere near exact enough)

UAX14 is a starting point for universal line breaking, not the ultimate authority on quality typesetting. We are intentionally not requiring it.

-- 
GitHub Notification of comment by fantasai
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/2559#issuecomment-444698382 using your GitHub account

Received on Thursday, 6 December 2018 00:03:47 UTC