Re: Question on Text Justification of Korean

On Fri, Oct 24, 2014 at 10:43 AM, fantasai <fantasai.lists@inkedblade.net>
wrote:

> On 10/23/2014 05:50 PM, Jungshik SHIN (신정식) wrote:
>
>>
>> Could you explain why treating Hangul and Han identically for the
>> justification hurts the justification quality of Hangul-only
>> documents (and Chinese and Japanese documents) ?
>>
>
> Okay, I will try to explain. :)
>
> The constraint of the situation is that we do not know the primary
> language or writing system because the document is untagged. Given
> this, we must come up with a justification system that is adequate
> for all systems.
>
> In order to adequately handle Japanese and Chinese, we must allow
> expansion between Han and Kana characters.
>
> In order to adequately handle most other writing systems, we must
> allow expansion at spaces.
>
> Korean is kindof a combination of both cases.
>
> At least one implementation has decided to handle this situation by
> expanding at spaces, Han, and Kana, but not Hangul. For Hangul-only
> documents, this will expand only at spaces, and for Chinese/Japanese
> documents, this will expand among all characters. For these documents,
> everyone is happy. But for mixed Han + Hangul documents, this solution
> has the behavior we are discussing. [1]
>

What Gecko does currently is: for Chinese/Japanese documents, expands
spaces, Han, and Kana. For any other documents, expands only spaces.
The consideration is that, in non-CJ documents, a Han or Kana word
may be presented as a single word just like other words, for example:

"Hello" is "你好" in Chinese.

In this case, Han should not be expanded either. I guess this algorithm
should also work fine for Korean documents, in which case, only spaces
are expanded.

- Xidorn

Received on Friday, 24 October 2014 03:46:45 UTC