W3C home > Mailing lists > Public > www-style@w3.org > January 2011

RE: [css3-text] line break opportunities are based on *syllable* boundaries?

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sat, 29 Jan 2011 00:48:35 -0500
To: Andrew Cunningham <andrewc@vicnet.net.au>
CC: "Phillips, Addison" <addison@lab126.com>, "Kang-Hao Lu" <kennyluck@w3.org>, WWW Style <www-style@w3.org>, WWW International <www-international@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AAF00A0BA@MAILR001.mail.lan>
Okay, I have to admit that I have less knowledge on "grapheme cluster" than you guys, so I'll take Kenny's proposal to use "character" instead.

There was a feedback from Japan that "syllable" in this context is incorrect at least for Japanese, and that is a correct opinion again at least for Japanese. I don't have that deep knowledge on Chinese, Yi, and Korean, so I'll follow you that "grapheme cluster" is also incorrect to use here.

So if "syllable" is wrong for Japanese, and "grapheme cluster" as defined in UAX #29 is also wrong for either of Chinese, Yi, or Korean, probably "character" is the last resort. It's non-normative, so I suppose ambiguity wouldn't be a big problem here.

Thank you for all the help.


-----Original Message-----
From: Andrew Cunningham [mailto:andrewc@vicnet.net.au] 
Sent: Saturday, January 29, 2011 12:14 PM
To: Koji Ishii
Cc: Phillips, Addison; Kang-Hao Lu; WWW Style; WWW International
Subject: RE: [css3-text] line break opportunities are based on *syllable* boundaries?

syllable and grapheme clusters are quite distinct and separate concepts.

I'd argue that you do want syllable boundaries, rather than grapheme cluster boundaries. But syllable boundaries are per language constructs, based on the phonological and orthographic properties of that language.

While in unicode terms, grapheme clusters hav a more generic definition.

But I doubt that grapheme clusters will give you what you want.

On Fri, January 28, 2011 17:32, Koji Ishii wrote:
> I'm changing back to the original subject as you seem to be talking 
> about the original topic, not the definition of "word".
> What I needed here is an appropriate terminology that represents 
> single character within this context:
>> In several other writing systems, (including Chinese, Japanese, Yi, 
>> and sometimes also Korean) a line break opportunities are based on
>> *syllable* boundaries, not words.
> I want "ソース" consists of three, so from what you said, it sounds 
> like "grapheme cluster" is the right choice of words to use here.
> I agree with you that the definition of "word" is different from 
> grapheme cluster, and I guess answering to that question is even more difficult.
> Regards,
> Koji
> -----Original Message-----
> From: Phillips, Addison [mailto:addison@lab126.com]
> Sent: Friday, January 28, 2011 2:22 PM
> To: Kang-Hao (Kenny) Lu; Koji Ishii
> Cc: WWW Style; WWW International
> Subject: RE: What's the definition of a word? (was: [css3-text] line 
> break opportunities are based on *syllable* boundaries?)
> The term "grapheme cluster" would be wrong for this context. A 
> grapheme cluster is a sequence of logical characters that form a 
> single visual unit of text (what is sometimes perceived as a 
> "character" or "glyph"). This term is used for cases such as an Indic 
> syllable followed by a combining vowel--in which a base character is 
> combined with additional characters to form a single glyph on screen, 
> rather than cases in which separate visual/logical units form a single 
> "word" or "sound". It also applies to cases such as a base letter followed by a combining accent.
> To help illustrate this, notice that the word "the" is not a grapheme 
> cluster, although it is a single syllable. Notice too that "ソース"
> consists of *three* graphemes (grapheme clusters), but only two 
> syllables.
> The relationship of Han ideographs to both "words" and "syllables" is 
> complex and depends both on the language (it is different for 
> Japanese, for example) and on context. It is sometimes true that 
> "ideograph == syllable" and sometimes also true that "ideograph == word".
> In any case, the concept of "grapheme cluster" should most definitely 
> not be consider to be synonymous with either "word" or "syllable". It 
> is a distinct unit and may not be *either* in a given context. My 
> understand was that languages written using Han ideographs could be 
> broken anywhere except for certain prescriptive cases (which differ by 
> language). While this might map to some other concept such as 
> syllables, wouldn't it be better to refer specifically to language specific rules?
> Unicode Standard Annex #14 [1] provides a useful description of 
> line-breaking properties that may be helpful here.
> Regards,
> Addison
> [1] http://www.unicode.org/reports/tr14/

> Addison Phillips
> Globalization Architect (Lab126)
> Chair (W3C I18N, IETF IRI WGs)
> Internationalization is not a feature.
> It is an architecture.
>> -----Original Message-----
>> From: www-international-request@w3.org [mailto:www-international- 
>> request@w3.org] On Behalf Of Kang-Hao (Kenny) Lu
>> Sent: Thursday, January 27, 2011 8:43 PM
>> To: Koji Ishii
>> Cc: WWW Style; WWW International
>> Subject: What's the definition of a word? (was: [css3-text] line 
>> break opportunities are based on *syllable* boundaries?)
>> > In Chinese, Yi, and Hangul, a character represents a syllable as
>> far as I understand, but in Japanese, Kanji characters could have 
>> more than one syllable, and also there are cases where multiple 
>> characters represent single syllable (like Kana + prolonged sound mark).
>> >
>> > Although this part is not normative, it looks like we should
>> replace "syllable" with "grapheme cluster".
>> >
>> > Please let me know if this change can be incorrect to any other
>> writing systems listed here than Japanese.
>> The situation is similar for Chinese as far as I can tell.
>> Speaking about this, this is editorial but the last time I read the 
>> spec, I got a little bit perplexed about the definition of "word".
>> Is
>> there a plan to briefly mention what a "word" is in the introduction 
>> section? Or perhaps there should be a glossary that puts "word" and 
>> "grapheme cluster" together? I doubt that there would be a consistent 
>> and precise definition throughout the spec but a brief and non- 
>> normative introduction seems helpful.
>> Cheers,
>> Kenny

Andrew Cunningham
Research and Development Coordinator
State Library of Victoria


Received on Saturday, 29 January 2011 05:47:33 UTC

This archive was generated by hypermail 2.4.0 : Monday, 23 January 2023 02:13:55 UTC