Re: [string-search] Requirements for Indian languages (#10)

> @vermaprashant1 The section about Encoding variations lists for Bengali the circumgraph vowel signs which are canonically equivalent in Unicode. It doesn't however mention combinations which are not recommended, such as অা [U+0985 BENGALI LETTER A + U+09BE BENGALI VOWEL SIGN AA] instead of আ [U+0986 BENGALI LETTER AA]
> 
> There are a fair number of these in Indian scripts, esp. for letters with nukta. Is it something you think should be in the document? (I haven't looked closely yet at all the language sections.) This is a misspelling rather than an alternative spelling.

 Here id the feedback received by Bengali expert:

1. These circumgraph vowel signs are typically known as vowel allographs. In Bengali, these are called 'svarachinha "vowel signs". 
2. In total, nine (9) vowel graphemes have these allographs: ā-kār, i-kār, ῑ-kār, u-kār, ῡ-kār, e-kār, ai-kaār, o-kār, and au-kār.
3. Each vowel allograph must be assigned a unique Unicode value.
4. Vowel allographs are never combined with vowel graphemes. They can only be combined with consonants and clusters (conjuncts).
5. আ (ā) is not a combination of অ (a) and া-কার (ā-kār). আ (ā) is a completely separate character with a unique Unicode value. Similarly, অ (a) is a separate character with another unique Unicode value. There should be no confusion regarding this.

We have not taken combinations which are not recommended in the document. It covers only alternative spellings/encoding and facts which are used by particular community.

-- 
GitHub Notification of comment by vermaprashant1
Please view or discuss this issue at https://github.com/w3c/string-search/issues/10#issuecomment-1206147495 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 5 August 2022 07:43:14 UTC