This, in Korean should be treated as broken. It is quite possible that they
change the meaning of word/sentence altogether.
On 24-Oct-2014 9:18 am, "Xidorn Quan" <quanxunzhen@gmail.com> wrote:
> On Fri, Oct 24, 2014 at 10:43 AM, fantasai <fantasai.lists@inkedblade.net>
> wrote:
>
>> On 10/23/2014 05:50 PM, Jungshik SHIN (신정식) wrote:
>>
>>>
>>> Could you explain why treating Hangul and Han identically for the
>>> justification hurts the justification quality of Hangul-only
>>> documents (and Chinese and Japanese documents) ?
>>>
>>
>> Okay, I will try to explain. :)
>>
>> The constraint of the situation is that we do not know the primary
>> language or writing system because the document is untagged. Given
>> this, we must come up with a justification system that is adequate
>> for all systems.
>>
>> In order to adequately handle Japanese and Chinese, we must allow
>> expansion between Han and Kana characters.
>>
>> In order to adequately handle most other writing systems, we must
>> allow expansion at spaces.
>>
>> Korean is kindof a combination of both cases.
>>
>> At least one implementation has decided to handle this situation by
>> expanding at spaces, Han, and Kana, but not Hangul. For Hangul-only
>> documents, this will expand only at spaces, and for Chinese/Japanese
>> documents, this will expand among all characters. For these documents,
>> everyone is happy. But for mixed Han + Hangul documents, this solution
>> has the behavior we are discussing. [1]
>>
>
> What Gecko does currently is: for Chinese/Japanese documents, expands
> spaces, Han, and Kana. For any other documents, expands only spaces.
> The consideration is that, in non-CJ documents, a Han or Kana word
> may be presented as a single word just like other words, for example:
>
> "Hello" is "你好" in Chinese.
>
> In this case, Han should not be expanded either. I guess this algorithm
> should also work fine for Korean documents, in which case, only spaces
> are expanded.
>
> - Xidorn
>