Re: [csswg-drafts] [css-text-3] line-break property should mention Finnish language

The spec says

> only CJK codepoints are affected, unless the text is marked as Chinese or Japanese, in which case some additional common codepoints are affected.

And then immediately says

> a UA ... could choose to map different levels of strictness in Thai line-breaking to these keywords

ICU [moved](https://github.com/unicode-org/icu/commit/e126a2c42b717b466780b584ac37e0dc59765814) the previous Finnish-language behavior into all languages, for all line-breaking modes, so this specific example is no longer relevant.

However, ICU line breaking rules are different for all locales depending on whether or not you're in `loose` mode or not. See the differences between [this file](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/rules/line.txt) and [this file](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/rules/line_loose.txt).

One example is that, in all locales, a series of adjacent U+2024 ONE DOT LEADER won't have a line break candidates between them, but in loose line breaking mode, they do. This is true for all characters with the [IN ("Inseparable") line-breaking property](http://www.unicode.org/reports/tr14-6/), which is U+2024 ONE DOT LEADER, U+2025 TWO DOT LEADER and U+2026 HORIZONTAL ELLIPSIS.

My recommendation is to remove the text about "only CJK codepoints are affected".

-- 
GitHub Notification of comment by litherum
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/1252#issuecomment-421839570 using your GitHub account

Received on Sunday, 16 September 2018 21:50:22 UTC