Re: [bp-i18n-specdev] legacy grapheme clusters vs extended grapheme clusters (#1) from Fuqiao Xue via GitHub on 2023-12-19 (public-i18n-archive@w3.org from October to December 2023)

From: Fuqiao Xue via GitHub <sysbot+gh@w3.org>
Date: Tue, 19 Dec 2023 07:09:48 +0000
To: public-i18n-archive@w3.org
Message-ID: <issue_comment.created-1862232885-1702969785-sysbot+gh@w3.org>

There is no mention of legacy grapheme clusters in specdev at the moment and I think this paragraph in `UAX #29` answers Florian's question:

> An ***extended grapheme cluster*** is the same as a legacy grapheme cluster, with the addition of some other characters. The continuing characters are extended to include all spacing combining marks, such as the spacing (but dependent) vowel signs in Indic scripts. For example, this includes U+093F (&nbsp;ि&nbsp;) DEVANAGARI VOWEL SIGN I. The extended grapheme clusters should be used in implementations in preference to legacy grapheme clusters, because they provide better results for Indic scripts such as Tamil or Devanagari in which editing by orthographic syllable is typically preferred. For scripts such as Thai, Lao, and certain other Southeast Asian scripts, editing by visual unit is typically preferred, so for those scripts the behavior of extended grapheme clusters is similar to (but not identical to) the behavior of legacy grapheme clusters.

IMHO this kind of detail should be mentioned by charmod, not in specdev.

-- 
GitHub Notification of comment by xfq
Please view or discuss this issue at https://github.com/w3c/bp-i18n-specdev/issues/1#issuecomment-1862232885 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 19 December 2023 07:09:50 UTC