W3C home > Mailing lists > Public > www-style@w3.org > February 2009

Re: Unicode Normalization thread should slow down; summary needed

From: Jonathan Kew <jonathan@jfkew.plus.com>
Date: Tue, 10 Feb 2009 18:37:59 +0000
Cc: Robert J Burns <rob@robburns.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Message-Id: <15AC45D2-A08A-42F9-9487-ACA9071591A1@jfkew.plus.com>
To: Henri Sivonen <hsivonen@iki.fi>

On 10 Feb 2009, at 12:44, Henri Sivonen wrote:

> (It seems that the Vietnamese input mode on Mac OS X normalizes to  
> NFC, by the way. In fact, I wouldn't be at all surprised if Mac OS X  
> already had solution #1 covered and this was just an issue of other  
> systems catching up.)

It's true that the Vietnamese keyboard layout Apple ships is designed  
to generate precomposed accented letters, using a dead-key approach.  
Text typed using this layout will therefore be in NFC. However, this  
does not mean that other keyboard layouts that can generate Vietnamese  
text -- for example, a general-purpose "Latin and diacritics" layout  
for linguistic/technical use -- will do the same, whether on Mac OS X  
or other platforms.

As for other scripts and languages, there are plenty of mainstream  
shipping keyboard layouts that do not necessarily generate normalized  
text. For example, staying on Mac OS X, I used the OS's Arabic  
keyboard layout to type the word مُحَبَّتْ into TextEdit.app.  
First, I typed it in what most users would consider "natural" or  
"logical" order, <meem damma hah fatha beh shadda fatha teh sukun>.  
Then I retyped it with the diacritics in canonical order, <meem damma  
hah fatha beh fatha shadda teh sukun>. The result is a file where the  
two "spellings" are preserved, and so a bytewise comparison will find  
them unequal, even though they look identical (at least with the  
Unicode-compliant font I'm using) and are defined by Unicode to be  
canonically equivalent.

Received on Tuesday, 10 February 2009 18:39:07 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:24 UTC