- From: Phillips, Addison <addison@amazon.com>
- Date: Wed, 17 Dec 2008 15:26:39 -0800
- To: "public-webcgm-wg@w3.org" <public-webcgm-wg@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "lofton@rockynet.com" <lofton@rockynet.com>
Hello Lofton, Thanks for the note on WebCGM 2.1. This response is on behalf of the Internationalization Core WG [0]. The Internationalization Core WG generally recommends using Unicode Normalization Form C (NFC) for normalization-sensitive operations such as string comparison. While this isn't always the right choice, it appears to us that it makes the most sense for font name matching for these reasons: - Most files and font names will probably already use NFC, so the need to actually normalize strings will be reduced. (Checking normalization is faster and easier than performing it) - Any file that uses ISO 8859-1 (Latin-1) as its encoding, for example, is already in NFC. - NFC is generally considered a non-destructive normalization, unlike the compatibility forms NFKC and NFKD. Please note that case-insensitive comparison is not addressed by Unicode normalization. For specific information on normalization, you can reference both the Unicode Standard Annex #15 [1] and the W3C Character Model, Part 2 (Normalization) [2]. The latter is still a working draft and is being revised currently. Please contact us on public-i18n-core@ if you have additional questions or concerns. We'd be happy to work with you to resolve this issue appropriately. Best Regards (for I18N Core), Addison [0] http://www.w3.org/2008/12/17-core-minutes.html [1] http://www.unicode.org/reports/tr15/ [2] http://www.w3.org/TR/charmod-norm/ Addison Phillips Globalization Architect -- Lab126 Internationalization is not a feature. It is an architecture. === -----Original Message----- From: public-i18n-core-request@w3.org [mailto:public-i18n-core-request@w3.org] On Behalf Of Lofton Henderson Sent: Wednesday, December 03, 2008 10:58 AM To: ishida@w3.org Cc: public-webcgm-wg@w3.org; public-i18n-core@w3.org Subject: Re: [WebCGM2.1][LC Review] i18n comment 6: Unicode normalization Hello, and thanks for the helpful I18N comments on the WebCGM 2.1 Last Call review. After some research into the details of Comment #6 -- that WebCGM should use a Unicode normalization form for font-name-string comparisons -- we see the wisdom of it for reliable matching. But lacking deep expertise on the topic, we'd welcome further advice. Question: Do you have a recommendation for which of the four normalization forms would be best? For background, recall that the subject string comparison is seeking a match between: on the one hand, a font-name-string as extracted from a WebCGM instance; and on the other hand, a font-name-string from the ACL file (a separate XML file) that specifies the font-name to be matched. We would expect Unicode normalization to potentially make a difference in those cases wherein the first string (font-name from WebCGM instance) is outside the well-defined core set of thirteen (13) fixed names that are required by the WebCGM standard. The character encoding in the WebCGM instance will be either ISOLatin1, or Unicode UTF8 or UTF16. If the answer is not simple enough for efficient email resolution, we would welcome your participation in our teleconference of Thursday, 04-dec, 11am EST. (Or alternately two weeks later if you can't make tomorrow.) Please let me know, and I will send telecon logistics. Thanks, -Lofton Henderson (Chair WebCGM WG) At 10:29 AM 11/11/2008 +0000, ishida@w3.org wrote: >Comment from the i18n review of: >http://www.w3.org/TR/2008/WD-webcgm21-20080917/WebCGM21-Config.html#ACI-fontmap > >Comment 6 >At http://www.w3.org/International/reviews/0811-webcgm/ >Editorial/substantive: S >Tracked by: RI > >Location in reviewed document: >9.3.2.2 >[http://www.w3.org/TR/2008/WD-webcgm21-20080917/WebCGM21-Config.html#ACI-maplist] > >Comment: >Normalization for string comparison should include conversion to a Unicode >normalization form, to eliminate issues related to precomposed vs. >decomposed characters and issues related to ordering of multiple combining >characters. > >
Received on Wednesday, 17 December 2008 23:27:19 UTC