- From: Andj via GitHub <sysbot+gh@w3.org>
- Date: Wed, 29 Jan 2025 10:14:46 +0000
- To: public-i18n-archive@w3.org
andjc has just created a new issue for https://github.com/w3c/bp-i18n-specdev: == Example of bengali grapheme clusters out fo data == The current editors draft has the following text: >For example, the Bangla user-perceived character kshī ক্ষী is composed of four characters: U+0995 BENGALI LETTER KA + U+09CD BENGALI SIGN VIRAMA + U+09B7 BENGALI LETTER SSA + U+09C0 BENGALI VOWEL SIGN II. >Unicode splits these into two grapheme clusters, unless language-specific tailoring is applied. For more information, see our article [Character encodings: Essential concepts](https://www.w3.org/International/articles/definitions-characters/index.en.html#characters). This describes the behavior prior to Unicode 15.1. UAX29 was updated in the Unicode 15.1 release, adding an additional rule [GB9c](https://www.unicode.org/reports/tr29/tr29-43.html#GB9c): >Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker For the example 'ক্ষী' , UAX29 revision 41 and earlier would result in two extended grapheme clusters ('ক্', 'ষী') while UAX29 revision 43 onwards results in a single extended grapheme cluster ('ক্ষী'). So behaviour is dependent on version of UAX29 (i.e. version of Unicode supported). Please view or discuss this issue at https://github.com/w3c/bp-i18n-specdev/issues/150 using your GitHub account -- Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Wednesday, 29 January 2025 10:14:47 UTC