Re: Some issues with the IRI document [nfcnfkc-04]

Paul Hoffman / IMC <phoffman@imc.org> writes:

> At 5:22 PM -0400 4/15/03, Martin Duerst wrote:
>>Overall, the normalization strategy on IRIs varies according to the
>>place in the URI:
>>
>>- For domain name part: use NFKC or more (i.e. nameprep), but
>>   gets normalized again (with nameprep) when doing dns lookup.
>>- For the path part: preferably NFKC, but NFC is okay when needed.
>>- For the query part: There may be cases where you on purpose
>>   want to use something totally unnormalized (e.g. when submitting
>>   unnormalized data to a CGI script that normalizes).
>>
>>Does that sound reasonable? Do you think it needs any changes in the
>>draft, and if yes, what would be those changes?
>
> It doesn't sound reasonable if you intend IRI comparison to be
> interoperable. If you don't intend IRI comparison to be interoperable,
> I still would pick one normalization for each of the three parts, and
> I would pick NFKC, but you don't have to be consistent if
> interoperability isn't important.

A tangental observation on using different normalization strategies on
different parts of the URI:

If, say, a username (iuserinfo) within a IRI is normalized into
something different than, say, a security protocol such as SASL or
Kerberos (which uses different normalization strategies, both with
regards to each other and to the ones discussed here) would normalize
the username into, there are potential security consequences.

To examplify, consider if IRI adopted a nameprep style normalization
scheme that translates ß into ss, and either of SASL or Kerberos did
not but instead chosed to maintain the difference between ß and ss,
encoding a username containing ß into an IRI for use with SASL or
Kerberos would denote a different username.

I have not studied the IRI document closely, so this may have already
been solved in the proper way, if so I'm sorry to drag up these old
issues again.

Received on Wednesday, 16 April 2003 10:54:33 UTC