- From: Andrew Cunningham <andrewc@vicnet.net.au>
- Date: Wed, 04 Feb 2009 10:15:10 +1100
- To: Richard Ishida <ishida@w3.org>
- CC: 'Martin Duerst' <duerst@it.aoyama.ac.jp>, "'Phillips, Addison'" <addison@amazon.com>, public-i18n-core@w3.org, 'fantasai' <fantasai.lists@inkedblade.net>, 'Lachlan Hunt' <lachlan.hunt@lachy.id.au>, www-style@w3.org
- Message-ID: <4988CFFE.7020208@vicnet.net.au>
For languages that have to use combining diacritics and that are diacritic heavy it is impossible with most keyboard and input frameworks to develop a keyboard layout that generates NFC or NFD output, that requires a certain degree of sequence checking, reordering and other forms of processing, in essence a smart input system. Most keyboard frameworks are not smart input systems. The genral rule of thumb is that if all the letters required by the language can not be represented by only precomposed characters, then its most likely you will have unnormalised text generated. Richard Ishida wrote: >> -----Original Message----- >> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] >> Sent: 31 January 2009 07:11 >> > ... > >>> And the 83 million inhabitants of Vietnam are not the only >>> people who face this issue. There are many languages that use combining >>> characters, including the Latin script based languages of Africa and >>> aboriginal North America, most scripts of Asia, etc., and one can't always >>> guarantee that the input methods used for those languages will always >>> >> create >> >>> text in one given form vis a vis normalization. >>> >> Using combinging characters isn't what's important. If no precombined >> variant is available, and there is only one combining character, there >> are no problems,... It would be very good to see some actual examples, >> rather than roundaboutly mentioning whole continents. >> > > Andrew cited some examples of African languages where denormalisation or different normalised forms can appear in text. Jonathan mentioned some Arabic. One or two more examples, then... > > I recently came across http://languagegeek.com/ which provides a fair number of keyboards (and other things) to support aboriginal (mostly North American) languages. I didn't have to look hard for a problem. If you install the Tlicho (Tłįchǫ or Dogrib) keyboard on Windows (see a picture at http://rishida.net/scripts/pickers/tlicho/) and type the name of the language itself, it comes out in NFD. It is also possible to incorrectly order multiple diacritics (ie. not even NFD). You could say that the keyboard *ought* to churn out NFC, but it's too late. People using those keyboards will be producing content that may look different to that created by people using other input methods. For example the following was generated by typing the first two accented letters using the Tlicho keyboard then the same two using the US International keyboard: > > e U+0065: LATIN SMALL LETTER E > > ̀ U+0300: COMBINING GRAVE ACCENT > > o U+006F: LATIN SMALL LETTER O > > ̀ U+0300: COMBINING GRAVE ACCENT > > é U+00E9: LATIN SMALL LETTER E WITH ACUTE > > ò U+00F2: LATIN SMALL LETTER O WITH GRAVE > > Another example would be using a standard Tamil Windows keyboard, where *on the same keyboard* it is just as easy to produce a result that looks the same using > > 0B95: க TAMIL LETTER KA > > 0BCB: ோ TAMIL VOWEL SIGN OO > > as > > 0B95: க TAMIL LETTER KA > > 0BC7: ே TAMIL VOWEL SIGN EE > > 0BBE: ா TAMIL VOWEL SIGN AA > > The font I use doesn't complain about it. The single cc is NFC, the two is NFD. This of course applies to a number of indic scripts. > > Another two examples, Khmer (http://rishida.net/scripts/khmer/#cporder) and Myanmar (http://rishida.net/scripts/myanmar/#cporder) are almost as sensitive to combining character order as Vietnamese. Some fonts only work with one order for multiple diacritics, other fonts allow different ordering. Again, a single keyboard tends to allow a user to input the same text in different ways. > > These are just a few examples, and by no means exhaustive. > > RI > > > > -- Andrew Cunningham Senior Manager, Research and Development Vicnet State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Ph: +61-3-8664-7430 Fax: +61-3-9639-2175 Email: andrewc@vicnet.net.au Alt email: lang.support@gmail.com http://home.vicnet.net.au/~andrewc/ http://www.openroad.net.au http://www.vicnet.net.au http://www.slv.vic.gov.au
Received on Tuesday, 3 February 2009 23:16:42 UTC