Re: FW: [WebCGM2.1][LC Review] i18n comment 6: Unicode normalization from Martin Duerst on 2008-12-04 (public-i18n-core@w3.org from October to December 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Thu, 04 Dec 2008 13:13:16 +0900
To: "Phillips, Addison" <addison@amazon.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <6.0.0.20.2.20081204131305.075c9210@localhost>

+1

At 05:06 08/12/04, Phillips, Addison wrote:
>(proposed response follows to be discussed as an agenda+ today)
>
>Addison Phillips
>Globalization Architect -- Lab126
>
>Internationalization is not a feature.
>It is an architecture.
>
>--
>
>Hello Lofton,
>
>Thanks for the note on WebCGM 2.1. 
>
>The I18N Core WG generally recommends using Unicode Normalization Form C 
>(NFC) for normalization-sensitive operations such as string comparison. 
>While this isn't always the right choice, it appears to us that it makes 
>the most sense for font name matching for these reasons:
>
> - Most files and font names will probably already use NFC, so the need to 
>actually normalize strings will be reduced. (Checking normalization is 
>faster and easier than performing it) 
> - Any file that uses ISO 8859-1 (Latin-1) as its encoding, for example, is 
>already in NFC.
> - NFC is generally considered a non-destructive normalization, unlike the 
>compatibility forms NFKC and NFKD.
>
>Please note that case-insensitive comparison is not addressed by Unicode 
>normalization.
>
>For specific information on normalization, you can reference both the 
>Unicode Standard Annex and the W3C Character Model, Part 2 (Normalization). 
>The latter is still a working draft and is being revised currently.
>
>Best Regards (for I18N Core),
>
>Addison
>
>// etc.
>
>
>-----Original Message-----
>From: public-i18n-core-request@w3.org 
>[mailto:public-i18n-core-request@w3.org] On Behalf Of Lofton Henderson
>Sent: Wednesday, December 03, 2008 10:58 AM
>To: ishida@w3.org
>Cc: public-webcgm-wg@w3.org; public-i18n-core@w3.org
>Subject: Re: [WebCGM2.1][LC Review] i18n comment 6: Unicode normalization
>
>
>Hello, and thanks for the helpful I18N comments on the WebCGM 2.1 Last Call 
>review.
>
>After some research into the details of Comment #6 -- that WebCGM should 
>use a Unicode normalization form for font-name-string comparisons -- we see 
>the wisdom of it for reliable matching.  But lacking deep expertise on the 
>topic, we'd welcome further advice.
>
>Question:  Do you have a recommendation for which of the four normalization 
>forms would be best?
>
>For background, recall that the subject string comparison is seeking a 
>match between:  on the one hand, a font-name-string as extracted from a 
>WebCGM instance; and on the other hand, a font-name-string from the ACL 
>file (a separate XML file) that specifies the font-name to be matched.
>
>We would expect Unicode normalization to potentially make a difference in 
>those cases wherein the first string (font-name from WebCGM instance) is 
>outside the well-defined core set of thirteen (13) fixed names that are 
>required by the WebCGM standard.  The character encoding in the WebCGM 
>instance will be either ISOLatin1, or Unicode UTF8 or UTF16.
>
>If the answer is not simple enough for efficient email resolution, we would 
>welcome your participation in our teleconference of Thursday, 04-dec, 11am 
>EST.  (Or alternately two weeks later if you can't make tomorrow.)  Please 
>let me know, and I will send telecon logistics.
>
>Thanks,
>-Lofton Henderson
>(Chair WebCGM WG)
>
>
>At 10:29 AM 11/11/2008 +0000, ishida@w3.org wrote:
>
>>Comment from the i18n review of:
>>http://www.w3.org/TR/2008/WD-webcgm21-20080917/WebCGM21-Config.html#ACI-fontmap
>>
>>Comment 6
>>At http://www.w3.org/International/reviews/0811-webcgm/
>>Editorial/substantive: S
>>Tracked by: RI
>>
>>Location in reviewed document:
>>9.3.2.2 
>>[http://www.w3.org/TR/2008/WD-webcgm21-20080917/WebCGM21-Config.html#ACI-maplist]
>>
>>Comment:
>>Normalization for string comparison should include conversion to a Unicode 
>>normalization form, to eliminate issues related to precomposed vs. 
>>decomposed characters and issues related to ordering of multiple combining 
>>characters.
>>
>>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Thursday, 4 December 2008 08:02:16 UTC