W3C home > Mailing lists > Public > www-style@w3.org > March 2015

Re: [css-text] word-break for Korean

From: Koji Ishii <kojiishi@gmail.com>
Date: Wed, 4 Mar 2015 23:16:02 +0900
Message-ID: <CAN9ydbWUayGSAFRmgsenFbNhgoSz8Tjz3VAqxc2inO0OgkpBbQ@mail.gmail.com>
To: Florian Rivoal <florian@rivoal.net>
Cc: www-style list <www-style@w3.org>, www International <www-international@w3.org>
Very interesting, thank you for the pointer to an implementation.

I'm very curious to know the use case. Don't we have members from
Bloomberg in this WG?

I'm curious because my guess is different; people may want that not
because they want to mix such line breaking behavior within one
document, but because they want to apply such line breaking styles
without tagging the documents at all but use single CSS globally,
assuming no ideographic characters appear within their Korean
documents. I'm curious to know which guess is correct.

It's also interesting to me that, what I've been hearing is that the
"keep-korean" style is mostly used in traditional style or paper-based
documents, while web and news (that have narrower columns) prefer to
break. Bloomberg to pick the opposite is very interesting to me.

/koji

On Wed, Mar 4, 2015 at 7:56 PM, Florian Rivoal <florian@rivoal.net> wrote:
> As the css-text spec acknowledges, there are two common line breaking behaviors for Korean:
> - break at spaces (like english)
> - break between every syllable (like Chinese)
>
> ''work-break:normal'' does the later, and if you want the former behavior, you are expected to use ''word-break: keep-all''.
>
> If you want the ''normal'' behavior on all languages, and the ''keep-all'' behavior on Korean (which is a pretty reasonable thing to want), you can do:
>
> foo {  word-break: normal; }
> foo:lang(ko) {  word-break: keep-all; }
>
> However, this assumes that you know in advance and can properly tag the language used in each element, which is typically not true for form fields or contenteditable elements, where users may input anything they like. Setting those to ''keep-all' would have the expected behavior for Korean, but not for other languages.
>
> Therefore, I suggest that a fourth value is worth having:
>
> ''word-break: keep-all-if-korean-otherwise-normal'' (obviously in need of a better name) would behave the same as ''word-break: normal'', except it would suppress implicit soft-wrap opportunities between two Korean syllables, except where opportunities exist due to dictionary-based breaking. A Korean syllable is defined to be either a precomposed Hangul Syllable (U+AC00 to U+D7A3) or a syllable composed of Hangul Jamo (U+1100 to U+11FF, U+3130 to U+318F, U+A960 to U+A97F, U+D7B0 to U+D7FF). An alternative definition would be to suppress implicit soft-wrap opportunities between a Korean syllable (as defined above) and any typographic-letter-unit, except where opportunities exist due to dictionary-based breaking.
>
> A few suggestions on how to name the value (by rough personal order of preference):
> word-break: normal keep-hangul? | keep-all | break-all
> word-break: normal | keep-hangul | keep-all | break-all
> word-break: normal keep-korean? | keep-all | break-all
> word-break: normal | keep-korean | keep-all | break-all
> word-break: normal | keep-all-if-hangul | keep-all | break-all
> word-break: normal | keep-all-if-korean | keep-all | break-all
>
> Alternatively, this could probably be a value of line-break (or even a separate property), but that feels less appropriate, as it would only do anything when word-break computes to normal.
>
> As for real-world evidence that this is needed, Bloomberg has implemented ''word-break: -bb-keep-all-if-korean''[1] in its chromium fork[2] for this very reason.
>
>  - Florian
>
> [1] https://github.com/bloomberg/chromium.bb/commit/8730bd6b3bf300fd0ab1640cf1636971ca8eda26
> [2] https://github.com/bloomberg/chromium.bb
>
>
>
Received on Wednesday, 4 March 2015 14:16:34 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:30 UTC