Re: [css-text] word-break for Korean

On Wed, Mar 4, 2015 at 6:16 AM, Koji Ishii <kojiishi@gmail.com> wrote:

> Very interesting, thank you for the pointer to an implementation.
>
> I'm very curious to know the use case. Don't we have members from
> Bloomberg in this WG?
>
> I'm curious because my guess is different; people may want that not
> because they want to mix such line breaking behavior within one
> document, but because they want to apply such line breaking styles
> without tagging the documents at all but use single CSS globally,
> assuming no ideographic characters appear within their Korean
> documents. I'm curious to know which guess is correct.
>
> It's also interesting to me that, what I've been hearing is that the
> "keep-korean" style is mostly used in traditional style or paper-based
> documents, while web and news (that have narrower columns) prefer to
> break. Bloomberg to pick the opposite is very interesting to me.
>

I wonder where you heard that 'keep-korean' style is widely used in
traditional or paper-based documents. Of hundreds of Korean books in my
bookshelves, I can't find a single book which uses "line breaking only at
space" for the paragraph layout.

Jungshik



>
> /koji
>
> On Wed, Mar 4, 2015 at 7:56 PM, Florian Rivoal <florian@rivoal.net> wrote:
> > As the css-text spec acknowledges, there are two common line breaking
> behaviors for Korean:
> > - break at spaces (like english)
> > - break between every syllable (like Chinese)
> >
> > ''work-break:normal'' does the later, and if you want the former
> behavior, you are expected to use ''word-break: keep-all''.
> >
> > If you want the ''normal'' behavior on all languages, and the
> ''keep-all'' behavior on Korean (which is a pretty reasonable thing to
> want), you can do:
> >
> > foo {  word-break: normal; }
> > foo:lang(ko) {  word-break: keep-all; }
> >
> > However, this assumes that you know in advance and can properly tag the
> language used in each element, which is typically not true for form fields
> or contenteditable elements, where users may input anything they like.
> Setting those to ''keep-all' would have the expected behavior for Korean,
> but not for other languages.
> >
> > Therefore, I suggest that a fourth value is worth having:
> >
> > ''word-break: keep-all-if-korean-otherwise-normal'' (obviously in need
> of a better name) would behave the same as ''word-break: normal'', except
> it would suppress implicit soft-wrap opportunities between two Korean
> syllables, except where opportunities exist due to dictionary-based
> breaking. A Korean syllable is defined to be either a precomposed Hangul
> Syllable (U+AC00 to U+D7A3) or a syllable composed of Hangul Jamo (U+1100
> to U+11FF, U+3130 to U+318F, U+A960 to U+A97F, U+D7B0 to U+D7FF). An
> alternative definition would be to suppress implicit soft-wrap
> opportunities between a Korean syllable (as defined above) and any
> typographic-letter-unit, except where opportunities exist due to
> dictionary-based breaking.
> >
> > A few suggestions on how to name the value (by rough personal order of
> preference):
> > word-break: normal keep-hangul? | keep-all | break-all
> > word-break: normal | keep-hangul | keep-all | break-all
> > word-break: normal keep-korean? | keep-all | break-all
> > word-break: normal | keep-korean | keep-all | break-all
> > word-break: normal | keep-all-if-hangul | keep-all | break-all
> > word-break: normal | keep-all-if-korean | keep-all | break-all
> >
> > Alternatively, this could probably be a value of line-break (or even a
> separate property), but that feels less appropriate, as it would only do
> anything when word-break computes to normal.
> >
> > As for real-world evidence that this is needed, Bloomberg has
> implemented ''word-break: -bb-keep-all-if-korean''[1] in its chromium
> fork[2] for this very reason.
> >
> >  - Florian
> >
> > [1]
> https://github.com/bloomberg/chromium.bb/commit/8730bd6b3bf300fd0ab1640cf1636971ca8eda26
> > [2] https://github.com/bloomberg/chromium.bb
> >
> >
> >
>
>

Received on Wednesday, 4 March 2015 21:21:24 UTC