Re: [string-search] Requirements for Indian languages (#10)

@vermaprashant1 The section about Encoding variations lists for Bengali the circumgraph vowel signs which are canonically equivalent in Unicode. It doesn't however mention combinations which are not recommended, such as 
<span class="codepoint" translate="no"><bdi lang="bn">&#x0985;&#x09BE;</bdi> [<span class="uname">U+0985 BENGALI LETTER A</span> + <span class="uname">U+09BE BENGALI VOWEL SIGN AA</span>]</span> instead of 
<span class="codepoint" translate="no"><bdi lang="bn">&#x0986;</bdi> [<span class="uname">U+0986 BENGALI LETTER AA</span>]</span>

There are a fair number of these in Indian scripts, esp. for letters with nukta.  Is it something you think should be in the document? (I haven't looked closely yet at all the language sections.)  This is a misspelling rather than an alternative spelling.

 

-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at https://github.com/w3c/string-search/issues/10#issuecomment-1183140196 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 13 July 2022 12:08:06 UTC