- From: Michel Suignard <michelsu@windows.microsoft.com>
- Date: Wed, 16 Apr 2003 14:01:54 -0700
- To: "Paul Hoffman / IMC" <phoffman@imc.org>, "Martin Duerst" <duerst@w3.org>
- Cc: <public-iri@w3.org>
| From: Paul Hoffman / IMC [mailto:phoffman@imc.org] | At 5:22 PM -0400 4/15/03, Martin Duerst wrote: | >Overall, the normalization strategy on IRIs varies according to the | >place in the URI: | > | >- For domain name part: use NFKC or more (i.e. nameprep), but | > gets normalized again (with nameprep) when doing dns lookup. | >- For the path part: preferably NFKC, but NFC is okay when needed. | >- For the query part: There may be cases where you on purpose | > want to use something totally unnormalized (e.g. when submitting | > unnormalized data to a CGI script that normalizes). | > | >Does that sound reasonable? Do you think it needs any changes in the | >draft, and if yes, what would be those changes? | | It doesn't sound reasonable if you intend IRI comparison to be | interoperable. If you don't intend IRI comparison to be | interoperable, I still would pick one normalization for each of the | three parts, and I would pick NFKC, but you don't have to be | consistent if interoperability isn't important. | | Am I the only person who worries about IRI comparison being | interoperable? I really think it is a bad idea to try to enforce NFKC on all components of a IRI string. What is tolerable for a host/domain name is not for many other components. NFKC removes many subtleties from the character repertoire that may have to be preserved for some schemes. For all issues/concerns with NFKC you can check the Unicode TR15 http://www.unicode.org/reports/tr15/. In general, NFC is much better. Furthermore there are even some components like the query fragment where you may want to transmit a non normalized text string. Interoperability is obviously important, but it just means that some scheme awareness is required for comparison. Martin has already answered most of that part so I won't go there. Michel
Received on Wednesday, 16 April 2003 17:04:40 UTC