W3C home > Mailing lists > Public > public-i18n-archive@w3.org > January to March 2020

[iip] Independent vowels are confusing (#95)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Wed, 05 Feb 2020 06:07:47 +0000
To: public-i18n-archive@w3.org
Message-ID: <issues.opened-560161460-1580882866-sysbot+gh@w3.org>
r12a has just created a new issue for https://github.com/w3c/iip:

== Independent vowels are confusing ==
Gurmukhi is unique from other Indic scripts in that it has independent vowels, namely, ੳ U+0A73, ਅ U+0A05 and ੲ U+0A72. ੳ U+0A73 and ੲ U+0A72 have no inherent sound and require attaching to a dependent vowel (ex: ੁ U+0A41).

For compatibility with other Indic scripts, ਉ U+0A09 exists as a single code point in Unicode. i.e. ਉ U+0A09 occupies the same spot in the Gurmukhi plane as उ U+0909 in the Devanagari (which does not have independent vowels) plane.

This causes great confusion for Punjabi / Gurmukhi users who would expect ਉ U+0A09 and the combinations of ੳ U+0A73 with ੁ U+0A41 to be equivalent—but they are not.
We should get some sort of compatibility equivalence for these characters or at least treat them as equivalent for the purposes of sorting, search, collation, etc.

<li>ਉ U+0A09 = ੳ U+0A73 + ੁ U+0A41</li>
<li>ਊ U+0A0A = ੳ U+0A73 + ੂ U+0A42</li>
<li>ਓ U+0A13 = ੳ U+0A73 + ੋ U+0A4B</li>
<li>ਆ U+0A06 = ਅ U+0A05 + ਾ U+0A3E</li>
<li>ਐ U+0A10 = ਅ U+0A05 + ੈ U+0A48</li>
<li>ਔ U+0A14 = ਅ U+0A05 + ੌ U+0A4C</li>
<li>ਇ U+0A07 = ੲ U+0A72 + ਿ U+0A3F</li>
<li>ਈ U+0A08 = ੲ U+0A72 + ੀ U+0A40</li>
<li>ਏ U+0A0F = ੲ U+0A72 + ੇ U+0A47</li>

There seems to be a decent amount of data generated using the incorrect sequence. ex: Searching ਅਾਲੂ returns 2450 results on Google, while searching ਆਲੂ returns 131,000 results (~2%)

Please view or discuss this issue at https://github.com/w3c/iip/issues/95 using your GitHub account
Received on Wednesday, 5 February 2020 06:07:49 UTC

This archive was generated by hypermail 2.4.0 : Monday, 4 July 2022 18:09:41 UTC