W3C home > Mailing lists > Public > www-international@w3.org > January to March 2015

Re: [css-text] word-break for Korean

From: 신정식 <jshin1987+w3@gmail.com>
Date: Wed, 4 Mar 2015 13:17:15 -0800
Message-ID: <CAE1ONj_YKw6JJC6VYvGrJf0d3DZRvb2_j41vXtE+14szfqsycA@mail.gmail.com>
To: Florian Rivoal <florian@rivoal.net>
Cc: Shezan Baig <shezbaig.wk@gmail.com>, Koji Ishii <kojiishi@gmail.com>, www-style list <www-style@w3.org>, www International <www-international@w3.org>

I have to be rather blunt in the upfront. I've been trying to dispel the
misinformed notion that Korean has two (almost equally used) line breaking
modes in Unicode TR and CSS3, but perhaps I have not been diligent enough.

In the vast majority of cases, Korean text (long running paragraphs in *books
in print, newspaper articles in print*, [1]  web pages, etc) does use
line-breaking at any syllable boundaries. What Bloomberg does is
misguided.  For Korean articles and user comments, line-breaking should be
done at any syllable boundaries.

'word-break: keep-all' is useful for short text snippets in such places as
'subject line', 'title', 'advertisement', user-interface, etc.

In the nascent days of the web (~ 1994) when Mozilla and Netscape 0.9x or
1.0x did not support Korean line breaking (at any syllable boundaries),
some Korean web authors ran a simple script to insert <wbr> between all the
pairs of Korean syllables so that they could be treated as line breaking

In short, I don't see any strong need for introducing this new property
value for word-break or line-break.


On Wed, Mar 4, 2015 at 6:36 AM, Florian Rivoal <florian@rivoal.net> wrote:

> On 04 Mar 2015, at 15:33, Shezan Baig <shezbaig.wk@gmail.com> wrote:
> On Wed, Mar 4, 2015 at 9:19 AM Koji Ishii <kojiishi@gmail.com> wrote:
>> I'm curious because my guess is different; people may want that not
>> because they want to mix such line breaking behavior within one
>> document, but because they want to apply such line breaking styles
>> without tagging the documents at all but use single CSS globally,
>> assuming no ideographic characters appear within their Korean
>> documents. I'm curious to know which guess is correct.
> Our use case is within contenteditable, where we have no idea ahead of
> time what language the user would type in.  If they type Korean, then we
> want it to do "keep-all", otherwise we let it do "normal".  Since this
> happens dynamically, we cannot tag the document ahead of time one way or
> another.
>> It's also interesting to me that, what I've been hearing is that the
>> "keep-korean" style is mostly used in traditional style or paper-based
>> documents, while web and news (that have narrower columns) prefer to
>> break. Bloomberg to pick the opposite is very interesting to me.
> Note that our use case is for contenteditable, where the user is the
> author of the content.  For our actual news articles, I would assume this
> isn't a use-case since this is prepared ahead of time, and the language is
> known, so it can be tagged.
> So in Bloomberg's case, tagging the content is not an option, and this
> value is needed to get the desired result.
> At the same time, I agree with you Koji that people will probably also
> want to use it to style documents without property language tagging. In
> general, we should encourage people to tag their content properly, but it
> is common enough that the people writing the css do not have access to the
> markup-generating part of the system, and cannot fix it if it is deficient.
> As a design principle, we should not prioritize the needs of authors who do
> not language-tag their content over those who do, but since in this case
> there is no negative impact on properly tagged content, I don't think it is
> an issue.
>  - Florian
Received on Wednesday, 4 March 2015 21:17:45 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:07 UTC