Re: Question on Text Justification of Korean from Xidorn Quan on 2014-10-24 (www-style@w3.org from October 2014)

From: Xidorn Quan <quanxunzhen@gmail.com>
Date: Fri, 24 Oct 2014 14:45:22 +1100
To: fantasai <fantasai.lists@inkedblade.net>
Cc: Jungshik SHIN (신정식) <jshin1987@gmail.com>, Koji Ishii <kojiishi@gluesoft.co.jp>, "public-html-ig-ko@w3.org" <public-html-ig-ko@w3.org>, CJK discussion <public-i18n-cjk@w3.org>, hyunyoung kim <corolla.kim@gmail.com>, "www-style@w3.org" <www-style@w3.org>, Daniel Glazman <daniel.glazman@disruptive-innovations.com>
Message-ID: <CAMdq69_GH16K0ApDpsAFphTKSm7X5QZ6vnYNFC9-Bb26ruUoEw@mail.gmail.com>

On Fri, Oct 24, 2014 at 10:43 AM, fantasai <fantasai.lists@inkedblade.net>
wrote:

> On 10/23/2014 05:50 PM, Jungshik SHIN (신정식) wrote:
>
>>
>> Could you explain why treating Hangul and Han identically for the
>> justification hurts the justification quality of Hangul-only
>> documents (and Chinese and Japanese documents) ?
>>
>
> Okay, I will try to explain. :)
>
> The constraint of the situation is that we do not know the primary
> language or writing system because the document is untagged. Given
> this, we must come up with a justification system that is adequate
> for all systems.
>
> In order to adequately handle Japanese and Chinese, we must allow
> expansion between Han and Kana characters.
>
> In order to adequately handle most other writing systems, we must
> allow expansion at spaces.
>
> Korean is kindof a combination of both cases.
>
> At least one implementation has decided to handle this situation by
> expanding at spaces, Han, and Kana, but not Hangul. For Hangul-only
> documents, this will expand only at spaces, and for Chinese/Japanese
> documents, this will expand among all characters. For these documents,
> everyone is happy. But for mixed Han + Hangul documents, this solution
> has the behavior we are discussing. [1]
>

What Gecko does currently is: for Chinese/Japanese documents, expands
spaces, Han, and Kana. For any other documents, expands only spaces.
The consideration is that, in non-CJ documents, a Han or Kana word
may be presented as a single word just like other words, for example:

"Hello" is "你好" in Chinese.

In this case, Han should not be expanded either. I guess this algorithm
should also work fine for Korean documents, in which case, only spaces
are expanded.

- Xidorn

Received on Friday, 24 October 2014 03:46:31 UTC