Re: Unicode Normalization thread should slow down; summary needed

On 10 Feb 2009, at 12:44, Henri Sivonen wrote:

> (It seems that the Vietnamese input mode on Mac OS X normalizes to  
> NFC, by the way. In fact, I wouldn't be at all surprised if Mac OS X  
> already had solution #1 covered and this was just an issue of other  
> systems catching up.)


It's true that the Vietnamese keyboard layout Apple ships is designed  
to generate precomposed accented letters, using a dead-key approach.  
Text typed using this layout will therefore be in NFC. However, this  
does not mean that other keyboard layouts that can generate Vietnamese  
text -- for example, a general-purpose "Latin and diacritics" layout  
for linguistic/technical use -- will do the same, whether on Mac OS X  
or other platforms.

As for other scripts and languages, there are plenty of mainstream  
shipping keyboard layouts that do not necessarily generate normalized  
text. For example, staying on Mac OS X, I used the OS's Arabic  
keyboard layout to type the word مُحَبَّتْ into TextEdit.app.  
First, I typed it in what most users would consider "natural" or  
"logical" order, <meem damma hah fatha beh shadda fatha teh sukun>.  
Then I retyped it with the diacritics in canonical order, <meem damma  
hah fatha beh fatha shadda teh sukun>. The result is a file where the  
two "spellings" are preserved, and so a bytewise comparison will find  
them unequal, even though they look identical (at least with the  
Unicode-compliant font I'm using) and are defined by Unicode to be  
canonically equivalent.

JK

Received on Tuesday, 10 February 2009 18:39:04 UTC