- From: Ambrose LI <ambrose.li@gmail.com>
- Date: Sat, 29 Jan 2011 00:49:38 -0500
- To: Andrew Cunningham <andrewc@vicnet.net.au>
- Cc: Koji Ishii <kojiishi@gluesoft.co.jp>, "Phillips, Addison" <addison@lab126.com>, Kang-Hao Lu <kennyluck@w3.org>, WWW Style <www-style@w3.org>, WWW International <www-international@w3.org>
But isn't Koji's example showing exactly that you *don't* really want to arbitrarily break between syllable boundaries? His example has three syllables according to Japanese rules. Even by English rules it's two syllables, not one. 2011/1/28 Andrew Cunningham <andrewc@vicnet.net.au>: > syllable and grapheme clusters are quite distinct and separate concepts. > > I'd argue that you do want syllable boundaries, rather than grapheme > cluster boundaries. But syllable boundaries are per language constructs, > based on the phonological and orthographic properties of that language. > > While in unicode terms, grapheme clusters hav a more generic definition. > > But I doubt that grapheme clusters will give you what you want. > > On Fri, January 28, 2011 17:32, Koji Ishii wrote: >> I'm changing back to the original subject as you seem to be talking about >> the original topic, not the definition of "word". >> >> What I needed here is an appropriate terminology that represents single >> character within this context: >> >>> In several other writing systems, (including Chinese, Japanese, Yi, >>> and sometimes also Korean) a line break opportunities are based on >>> *syllable* boundaries, not words. >> >> I want "ソース" consists of three, so from what you said, it sounds >> like "grapheme cluster" is the right choice of words to use here. >> >> I agree with you that the definition of "word" is different from grapheme >> cluster, and I guess answering to that question is even more difficult. >> >> >> Regards, >> Koji >> >> -----Original Message----- >> From: Phillips, Addison [mailto:addison@lab126.com] >> Sent: Friday, January 28, 2011 2:22 PM >> To: Kang-Hao (Kenny) Lu; Koji Ishii >> Cc: WWW Style; WWW International >> Subject: RE: What's the definition of a word? (was: [css3-text] line break >> opportunities are based on *syllable* boundaries?) >> >> The term "grapheme cluster" would be wrong for this context. A grapheme >> cluster is a sequence of logical characters that form a single visual unit >> of text (what is sometimes perceived as a "character" or "glyph"). This >> term is used for cases such as an Indic syllable followed by a combining >> vowel--in which a base character is combined with additional characters to >> form a single glyph on screen, rather than cases in which separate >> visual/logical units form a single "word" or "sound". It also applies to >> cases such as a base letter followed by a combining accent. >> >> To help illustrate this, notice that the word "the" is not a grapheme >> cluster, although it is a single syllable. Notice too that "ソース" >> consists of *three* graphemes (grapheme clusters), but only two >> syllables. >> >> The relationship of Han ideographs to both "words" and "syllables" is >> complex and depends both on the language (it is different for Japanese, >> for example) and on context. It is sometimes true that "ideograph == >> syllable" and sometimes also true that "ideograph == word". >> >> In any case, the concept of "grapheme cluster" should most definitely not >> be consider to be synonymous with either "word" or "syllable". It is a >> distinct unit and may not be *either* in a given context. My understand >> was that languages written using Han ideographs could be broken anywhere >> except for certain prescriptive cases (which differ by language). While >> this might map to some other concept such as syllables, wouldn't it be >> better to refer specifically to language specific rules? >> >> Unicode Standard Annex #14 [1] provides a useful description of >> line-breaking properties that may be helpful here. >> >> Regards, >> >> Addison >> >> [1] http://www.unicode.org/reports/tr14/ >> >> Addison Phillips >> Globalization Architect (Lab126) >> Chair (W3C I18N, IETF IRI WGs) >> >> Internationalization is not a feature. >> It is an architecture. >> >>> -----Original Message----- >>> From: www-international-request@w3.org [mailto:www-international- >>> request@w3.org] On Behalf Of Kang-Hao (Kenny) Lu >>> Sent: Thursday, January 27, 2011 8:43 PM >>> To: Koji Ishii >>> Cc: WWW Style; WWW International >>> Subject: What's the definition of a word? (was: [css3-text] line break >>> opportunities are based on *syllable* boundaries?) >>> >>> > In Chinese, Yi, and Hangul, a character represents a syllable as >>> far as I understand, but in Japanese, Kanji characters could have more >>> than one syllable, and also there are cases where multiple characters >>> represent single syllable (like Kana + prolonged sound mark). >>> > >>> > Although this part is not normative, it looks like we should >>> replace "syllable" with "grapheme cluster". >>> > >>> > Please let me know if this change can be incorrect to any other >>> writing systems listed here than Japanese. >>> >>> The situation is similar for Chinese as far as I can tell. >>> >>> Speaking about this, this is editorial but the last time I read the >>> spec, I got a little bit perplexed about the definition of "word". >>> Is >>> there a plan to briefly mention what a "word" is in the introduction >>> section? Or perhaps there should be a glossary that puts "word" and >>> "grapheme cluster" together? I doubt that there would be a consistent >>> and precise definition throughout the spec but a brief and non- >>> normative introduction seems helpful. >>> >>> >>> Cheers, >>> Kenny >> >> > > > -- > Andrew Cunningham > Research and Development Coordinator > Vicnet > State Library of Victoria > Australia > > andrewc@vicnet.net.au > > > -- cheers, -ambrose www.xanga.com/little_potato | twitter.com/little_potato
Received on Saturday, 29 January 2011 05:50:13 UTC