Re: [css3-text] line break opportunities are based on *syllable* boundaries?

Hello everybody,

Sorry to be late to this discussion.

On 2011/01/29 0:59, Phillips, Addison wrote:
>>>
>>> I want "ソース" consists of three, so from what you said, it
>> sounds

The question of whether the above is two or three syllables depends on 
the definition. In very detailed discussions, one ends up with three 
*morae* (see http://en.wikipedia.org/wiki/Mora_(linguistics)#Japanese) 
and two syllables. But such details are lost on both Japanese and 
non-Japanese non-experts.

>>> like "grapheme cluster" is the right choice of words to use here.

Grapheme cluster doesn't combine 'ソ' and 'ー', as far as I understand.
"ソー" isn't a "user-perceived character", the description given at the 
start of http://unicode.org/reports/tr29/. The fact that line breaks 
between 'ソ' and 'ー' are a bad idea is handled by disallowing 'ー' at 
the start of a line.

So the question of whether Japanese typography uses characters or 
grapheme clusters for line breaking essentially depends on what it does 
for non-Japanese (e.g. Indic, Thai,...) text. That also includes Ainu, 
where decomposed Kana are needed in some cases. For high precision, 
indeed grapheme cluster seems to be the right thing to do, although I 
guess a lot of Japanese layout software wouldn't (yet) be able to handle 
Indic grapheme clusters correctly.

Regards,   Martin.


>> I agree.
>>
>> Addison
>>
>> Addison Phillips
>> Globalization Architect (Lab126)
>> Chair (W3C I18N, IETF IRI WGs)
>>
>> Internationalization is not a feature.
>> It is an architecture.
>>
>>
>>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Wednesday, 2 February 2011 02:15:49 UTC