- From: Simon Josefsson <jas@extundo.com>
- Date: Wed, 16 Apr 2003 16:51:22 +0200
- To: public-iri@w3.org
Paul Hoffman / IMC <phoffman@imc.org> writes: > At 5:22 PM -0400 4/15/03, Martin Duerst wrote: >>Overall, the normalization strategy on IRIs varies according to the >>place in the URI: >> >>- For domain name part: use NFKC or more (i.e. nameprep), but >> gets normalized again (with nameprep) when doing dns lookup. >>- For the path part: preferably NFKC, but NFC is okay when needed. >>- For the query part: There may be cases where you on purpose >> want to use something totally unnormalized (e.g. when submitting >> unnormalized data to a CGI script that normalizes). >> >>Does that sound reasonable? Do you think it needs any changes in the >>draft, and if yes, what would be those changes? > > It doesn't sound reasonable if you intend IRI comparison to be > interoperable. If you don't intend IRI comparison to be interoperable, > I still would pick one normalization for each of the three parts, and > I would pick NFKC, but you don't have to be consistent if > interoperability isn't important. A tangental observation on using different normalization strategies on different parts of the URI: If, say, a username (iuserinfo) within a IRI is normalized into something different than, say, a security protocol such as SASL or Kerberos (which uses different normalization strategies, both with regards to each other and to the ones discussed here) would normalize the username into, there are potential security consequences. To examplify, consider if IRI adopted a nameprep style normalization scheme that translates ß into ss, and either of SASL or Kerberos did not but instead chosed to maintain the difference between ß and ss, encoding a username containing ß into an IRI for use with SASL or Kerberos would denote a different username. I have not studied the IRI document closely, so this may have already been solved in the proper way, if so I'm sorry to drag up these old issues again.
Received on Wednesday, 16 April 2003 10:54:33 UTC