- From: Richard57 via GitHub <sysbot+gh@w3.org>
- Date: Fri, 11 May 2018 19:30:38 +0000
- To: public-i18n-archive@w3.org
Richard57 has just created a new issue for https://github.com/w3c/charmod-norm: == Normalisation for Case-Insensitive Comparison == In Section 3.1 Step 3 'Normalisation', do we really want the case-insensitive Unicode full case folding comparison of "ᾳ͙" <U+03B1 GREEK SMALL LETTER ALPHA, U+0359 COMBINING ASTERISK BELOW, U+0345 COMBINING GREEK YPOGEGRAMMENI> and "α͙ι" <U+03B1 GREEK SMALL LETTER ALPHA, U+0359 COMBINING ASTERISK BELOW, U+03B9 GREEK SMALL LETTER IOTA> to depend on the choice of normalisation? NF(K)C yields 'different', while NF(K)D yields 'identical'. (The combining mark U+0359 was added to support the retranscription of deteriorated Greek manuscripts.) The sequence of Step 4, "case-folding" and Step 6 "compare code points" does not work properly. For example, in the comparison of the NFC strings "sś" <U+0073, U+015B> and "ß́" <U+00DF latin small letter sharp s, U+0301>, default case-folding yields the strings <U+0073, U+015B> and <U+0073, U+0073, U+0301>. However, converting to NFD and then case-folding would yield <U+0073, U+0073, U+0301> for both strings. Normalisation is required after case-folding. By contrast, apart from strings containing U+0345 when fully decomposed, normalisation (i.e. NFC/NFD) is not required before case-folding. However, compatibility decomposition, if applied, would be required before case-folding. Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/172 using your GitHub account
Received on Friday, 11 May 2018 19:30:41 UTC