- From: Mark Davis <mark.davis@icu-project.org>
- Date: Fri, 7 Mar 2008 08:52:06 -0800
- To: "Richard Ishida" <ishida@w3.org>
- Cc: public-i18n-core@w3.org
- Message-ID: <30b660a20803070852x7bfe054bl5cd7da8c8a76a740@mail.gmail.com>
Yes, we can refine those in the future. On Fri, Mar 7, 2008 at 8:46 AM, Richard Ishida <ishida@w3.org> wrote: > I don't think we can fix this with wording in the UAX. > > It seems we would need to investigate whether it makes sense to treat > Khmer > and Myanmar as a script (like Thai and Lao) that merits exceptional rules > for certain character combinations. We'd also need to check whether other > scripts can be addressed in a similar way. > > Would it make sense to expand the remit of extended grapheme clusters in a > future version of this document (since I guess it may be a little late to > get such work done for this iteration)? > > RI > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/blog/ > http://rishida.net/ > > > > > -----Original Message----- > > From: public-i18n-core-request@w3.org [mailto:public-i18n-core- > > request@w3.org] On Behalf Of Richard Ishida > > Sent: 07 March 2008 14:19 > > To: public-i18n-core@w3.org > > Subject: RE: [UAX29] i18n comment 8: Conjunct clusters > > > > > > The added explanation about why conjunct clusters are not included is > very > > useful. I gather from the text that aksaras can be split after a virama > > if > > the conjunct glyphs do not interact visually (although that's not > actually > > explicitly described). > > > > I still feel that the current definition may stop short of being > generally > > useful for some scripts. For example, Khmer subjoined consonants are > > always > > treated as subscripts, as far as I am aware. The grapheme cluster > concept > > doesn't seem to be very useful for Khmer as it stands, but I think could > > be > > extended for this script as it was for Thai and Lao and become more > useful. > > I suspect this may also be the case for Myanmar. > > > > RI > > > > ============ > > Richard Ishida > > Internationalization Lead > > W3C (World Wide Web Consortium) > > > > http://www.w3.org/International/ > > http://rishida.net/blog/ > > http://rishida.net/ > > > > > > > > > -----Original Message----- > > > From: public-i18n-core-request@w3.org [mailto:public-i18n-core- > > > request@w3.org] On Behalf Of ishida@w3.org > > > Sent: 07 March 2008 11:34 > > > To: public-i18n-core@w3.org > > > Subject: [UAX29] i18n comment 8: Conjunct clusters > > > > > > > > > Comment from the i18n review of: > > > http://www.unicode.org/reports/tr29/tr29-12.html > > > > > > Comment 8 > > > At http://www.w3.org/International/reviews/0801-uax29/ > > > Editorial/substantive: S > > > Tracked by: RI > > > > > > Location in reviewed document: > > > 3 [http://www.unicode.org/reports/tr29/tr29- > > > 12.html#Grapheme_Cluster_Boundaries] > > > > > > Comment: > > > We don't think extending default grapheme clusters to just incorporate > > > spacing marks goes far enough to actually providing better results for > a > > > very large proportion of the world's population. We feel that the > > Unicode > > > TC should conduct further research on how to extend default grapheme > > > clusters so that they incorporate the majority of indic and south-east > > > asian syllables. > > > > > > > > > Example: It is very common to have a sequence such as > > > consonant+virama+consonant+vowel_sign, eg. > > > > > > > > > 0938: स DEVANAGARI LETTER SA > > > > > > 094D: ॠDEVANAGARI SIGN VIRAMA > > > > > > 0925: थ DEVANAGARI LETTER THA > > > > > > 093F: ि DEVANAGARI VOWEL SIGN I > > > > > > > > > See this as it would be rendered > > > > [http://www.w3.org/International/reviews/0601-css3-selectors/sthiti.gif]. > > > > > > > > > Without tailoring, the current rules would result in text wrapping the > > THA > > > to the next line, or attempting to highlight only part of the > conjunct. > > > The basic unit for grapheme clusters for indic and south-east asian > > > scripts is the syllable, and just addressing spacing marks will still > > > leave you short of a useful solution. > > > > > > > > > We would like the Unicode TC to investigate the possibility of adding > a > > > rule to say that a vowel killer character extends the grapheme cluster > > to > > > any immediately adjacent base character and all its combining > characters. > > > > > > > > > We feel that introducing a definition of default grapheme clusters > that > > > addresses this issue will go a long way to helping ensure that > > > implementers provide applications that can handle South Asian and > South- > > > East Asian scripts much better than now. > > > > > > > > > We feel that extending default grapheme clusters to include only > spacing > > > marks may only complicate things further. We do not, however, feel > that > > > the extension of grapheme clusters should be abandoned. > > > > > > > > > > > -- Mark
Received on Friday, 7 March 2008 16:52:17 UTC