W3C home > Mailing lists > Public > www-style@w3.org > March 2015

Re: [css-text] word-break for Korean

From: Shezan Baig <shezbaig.wk@gmail.com>
Date: Wed, 04 Mar 2015 14:33:09 +0000
Message-ID: <CANMpiOSm=8k78gixs4hPPhx6xtCT2zmo1UMADdW4gqYdLUqsFw@mail.gmail.com>
To: Koji Ishii <kojiishi@gmail.com>, Florian Rivoal <florian@rivoal.net>
Cc: www-style list <www-style@w3.org>, www International <www-international@w3.org>
On Wed, Mar 4, 2015 at 9:19 AM Koji Ishii <kojiishi@gmail.com> wrote:

> I'm curious because my guess is different; people may want that not
> because they want to mix such line breaking behavior within one
> document, but because they want to apply such line breaking styles
> without tagging the documents at all but use single CSS globally,
> assuming no ideographic characters appear within their Korean
> documents. I'm curious to know which guess is correct.
>

Our use case is within contenteditable, where we have no idea ahead of time
what language the user would type in.  If they type Korean, then we want it
to do "keep-all", otherwise we let it do "normal".  Since this happens
dynamically, we cannot tag the document ahead of time one way or another.




> It's also interesting to me that, what I've been hearing is that the
> "keep-korean" style is mostly used in traditional style or paper-based
> documents, while web and news (that have narrower columns) prefer to
> break. Bloomberg to pick the opposite is very interesting to me.
>


Note that our use case is for contenteditable, where the user is the author
of the content.  For our actual news articles, I would assume this isn't a
use-case since this is prepared ahead of time, and the language is known,
so it can be tagged.






> /koji
>
> On Wed, Mar 4, 2015 at 7:56 PM, Florian Rivoal <florian@rivoal.net> wrote:
> > As the css-text spec acknowledges, there are two common line breaking
> behaviors for Korean:
> > - break at spaces (like english)
> > - break between every syllable (like Chinese)
> >
> > ''work-break:normal'' does the later, and if you want the former
> behavior, you are expected to use ''word-break: keep-all''.
> >
> > If you want the ''normal'' behavior on all languages, and the
> ''keep-all'' behavior on Korean (which is a pretty reasonable thing to
> want), you can do:
> >
> > foo {  word-break: normal; }
> > foo:lang(ko) {  word-break: keep-all; }
> >
> > However, this assumes that you know in advance and can properly tag the
> language used in each element, which is typically not true for form fields
> or contenteditable elements, where users may input anything they like.
> Setting those to ''keep-all' would have the expected behavior for Korean,
> but not for other languages.
> >
> > Therefore, I suggest that a fourth value is worth having:
> >
> > ''word-break: keep-all-if-korean-otherwise-normal'' (obviously in need
> of a better name) would behave the same as ''word-break: normal'', except
> it would suppress implicit soft-wrap opportunities between two Korean
> syllables, except where opportunities exist due to dictionary-based
> breaking. A Korean syllable is defined to be either a precomposed Hangul
> Syllable (U+AC00 to U+D7A3) or a syllable composed of Hangul Jamo (U+1100
> to U+11FF, U+3130 to U+318F, U+A960 to U+A97F, U+D7B0 to U+D7FF). An
> alternative definition would be to suppress implicit soft-wrap
> opportunities between a Korean syllable (as defined above) and any
> typographic-letter-unit, except where opportunities exist due to
> dictionary-based breaking.
> >
> > A few suggestions on how to name the value (by rough personal order of
> preference):
> > word-break: normal keep-hangul? | keep-all | break-all
> > word-break: normal | keep-hangul | keep-all | break-all
> > word-break: normal keep-korean? | keep-all | break-all
> > word-break: normal | keep-korean | keep-all | break-all
> > word-break: normal | keep-all-if-hangul | keep-all | break-all
> > word-break: normal | keep-all-if-korean | keep-all | break-all
> >
> > Alternatively, this could probably be a value of line-break (or even a
> separate property), but that feels less appropriate, as it would only do
> anything when word-break computes to normal.
> >
> > As for real-world evidence that this is needed, Bloomberg has
> implemented ''word-break: -bb-keep-all-if-korean''[1] in its chromium
> fork[2] for this very reason.
> >
> >  - Florian
> >
> > [1] https://github.com/bloomberg/chromium.bb/commit/
> 8730bd6b3bf300fd0ab1640cf1636971ca8eda26
> > [2] https://github.com/bloomberg/chromium.bb
> >
> >
> >
>
>
Received on Wednesday, 4 March 2015 14:33:39 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:30 UTC