W3C home > Mailing lists > Public > www-international@w3.org > January to March 2015

[css-text] word-break for Korean

From: Florian Rivoal <florian@rivoal.net>
Date: Wed, 4 Mar 2015 11:56:32 +0100
Message-Id: <C42719E7-5717-4EA7-891F-4FA21A58F0E7@rivoal.net>
Cc: www International <www-international@w3.org>
To: www-style list <www-style@w3.org>
As the css-text spec acknowledges, there are two common line breaking behaviors for Korean:
- break at spaces (like english)
- break between every syllable (like Chinese)

''work-break:normal'' does the later, and if you want the former behavior, you are expected to use ''word-break: keep-all''.

If you want the ''normal'' behavior on all languages, and the ''keep-all'' behavior on Korean (which is a pretty reasonable thing to want), you can do:

foo {  word-break: normal; }
foo:lang(ko) {  word-break: keep-all; }

However, this assumes that you know in advance and can properly tag the language used in each element, which is typically not true for form fields or contenteditable elements, where users may input anything they like. Setting those to ''keep-all' would have the expected behavior for Korean, but not for other languages.

Therefore, I suggest that a fourth value is worth having:

''word-break: keep-all-if-korean-otherwise-normal'' (obviously in need of a better name) would behave the same as ''word-break: normal'', except it would suppress implicit soft-wrap opportunities between two Korean syllables, except where opportunities exist due to dictionary-based breaking. A Korean syllable is defined to be either a precomposed Hangul Syllable (U+AC00 to U+D7A3) or a syllable composed of Hangul Jamo (U+1100 to U+11FF, U+3130 to U+318F, U+A960 to U+A97F, U+D7B0 to U+D7FF). An alternative definition would be to suppress implicit soft-wrap opportunities between a Korean syllable (as defined above) and any typographic-letter-unit, except where opportunities exist due to dictionary-based breaking.

A few suggestions on how to name the value (by rough personal order of preference):
word-break: normal keep-hangul? | keep-all | break-all
word-break: normal | keep-hangul | keep-all | break-all
word-break: normal keep-korean? | keep-all | break-all
word-break: normal | keep-korean | keep-all | break-all
word-break: normal | keep-all-if-hangul | keep-all | break-all
word-break: normal | keep-all-if-korean | keep-all | break-all

Alternatively, this could probably be a value of line-break (or even a separate property), but that feels less appropriate, as it would only do anything when word-break computes to normal.

As for real-world evidence that this is needed, Bloomberg has implemented ''word-break: -bb-keep-all-if-korean''[1] in its chromium fork[2] for this very reason.

 - Florian

[1] https://github.com/bloomberg/chromium.bb/commit/8730bd6b3bf300fd0ab1640cf1636971ca8eda26
[2] https://github.com/bloomberg/chromium.bb
Received on Wednesday, 4 March 2015 10:56:57 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:38 UTC