[iip] Text units for letter-spacing are incorrect (#67)

r12a has just created a new issue for https://github.com/w3c/iip:

== Text units for letter-spacing are incorrect ==
For various reasons wherever a word needs to be broken in constituent characters in case of Latin script, Indian language words can and should be broken based on Akshara.

As the W3C specification points to Unicode Text Segmentation (TR 29), it is observed that some of the browsers support it (e.g. Chrome and Firefox) whereas Microsoft Edge and Interner Explorer seems to break the words in individual characters.

It has been marked as basic as the Unicode Text Segmentation rules themselves need to be matured enough to cater to nuances of many languages that get written using Bangala script.

There are two instances in Bengali where hasant(virama) is preceded by a full vowel (U+0985 অ - BENGALI LETTER A and U+098F এ - BENGALI LETTER E). For rendering Ya-Phalaa(Ja-Phalaa) followed by অ and এ , it is necessary to type U+09CD hasant(virama) plus U+09AF ja preceded by the said vowels. This is a purely ligatural entity and the addition of Ya-Phalaa and ā matra is used to elicit the /æ/ sound as in English 'application', 'administration' etc. The Brahmi script, by nature does not have halant after a vowel. Halant is ‘vowel killer’. Only the consonants have inherent halants. Bengali has a deviant feature in the orthography here where ligatures অ্যা and এ্যা call for a combination of halant after a vowel. Also, in Bengali there are some conjunct using ZWJ and ZWJ which can be problematic.

Also, in cases where there is wrong Akshara formation e.g. Consonant+Matra+Matra, the breaking seems to stack ill formed akshara into one set instead of clearly breaking it separate. This breaking behaviour needs to improve. 

Please view or discuss this issue at https://github.com/w3c/iip/issues/67 using your GitHub account

Received on Tuesday, 4 February 2020 18:05:59 UTC