- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 7 Mar 2008 14:19:22 -0000
- To: <public-i18n-core@w3.org>
The added explanation about why conjunct clusters are not included is very useful. I gather from the text that aksaras can be split after a virama if the conjunct glyphs do not interact visually (although that's not actually explicitly described). I still feel that the current definition may stop short of being generally useful for some scripts. For example, Khmer subjoined consonants are always treated as subscripts, as far as I am aware. The grapheme cluster concept doesn't seem to be very useful for Khmer as it stands, but I think could be extended for this script as it was for Thai and Lao and become more useful. I suspect this may also be the case for Myanmar. RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/International/ http://rishida.net/blog/ http://rishida.net/ > -----Original Message----- > From: public-i18n-core-request@w3.org [mailto:public-i18n-core- > request@w3.org] On Behalf Of ishida@w3.org > Sent: 07 March 2008 11:34 > To: public-i18n-core@w3.org > Subject: [UAX29] i18n comment 8: Conjunct clusters > > > Comment from the i18n review of: > http://www.unicode.org/reports/tr29/tr29-12.html > > Comment 8 > At http://www.w3.org/International/reviews/0801-uax29/ > Editorial/substantive: S > Tracked by: RI > > Location in reviewed document: > 3 [http://www.unicode.org/reports/tr29/tr29- > 12.html#Grapheme_Cluster_Boundaries] > > Comment: > We don't think extending default grapheme clusters to just incorporate > spacing marks goes far enough to actually providing better results for a > very large proportion of the world's population. We feel that the Unicode > TC should conduct further research on how to extend default grapheme > clusters so that they incorporate the majority of indic and south-east > asian syllables. > > > Example: It is very common to have a sequence such as > consonant+virama+consonant+vowel_sign, eg. > > > 0938: स DEVANAGARI LETTER SA > > 094D: ् DEVANAGARI SIGN VIRAMA > > 0925: थ DEVANAGARI LETTER THA > > 093F: ि DEVANAGARI VOWEL SIGN I > > > See this as it would be rendered > [http://www.w3.org/International/reviews/0601-css3-selectors/sthiti.gif]. > > > Without tailoring, the current rules would result in text wrapping the THA > to the next line, or attempting to highlight only part of the conjunct. > The basic unit for grapheme clusters for indic and south-east asian > scripts is the syllable, and just addressing spacing marks will still > leave you short of a useful solution. > > > We would like the Unicode TC to investigate the possibility of adding a > rule to say that a vowel killer character extends the grapheme cluster to > any immediately adjacent base character and all its combining characters. > > > We feel that introducing a definition of default grapheme clusters that > addresses this issue will go a long way to helping ensure that > implementers provide applications that can handle South Asian and South- > East Asian scripts much better than now. > > > We feel that extending default grapheme clusters to include only spacing > marks may only complicate things further. We do not, however, feel that > the extension of grapheme clusters should be abandoned. > >
Received on Friday, 7 March 2008 14:16:06 UTC