Re: RFC 2617 Authentication and character sets revisited

On Wed, 26 Nov 2003 15:02:41 -0600, Adam Roach <adam@dynamicsoft.com> 
wrote:
> For example, let's consider a username like "Åke". If you simply
> specify UTF-8 as the encoding, you can still run into problems.
> If the client represents the initial character as U+00C5, but the
> server has it stored as U+0041 U+030A (both valid unicode
> representations of "Å"), then you'll end up hashing differently.
> The same, of course, applies to passwords.
>
> Fortunately, Unicode also defines normalization techniques that
> allow one to ensure a consisitant representation; see annex 15
> (http://www.unicode.org/reports/tr15/). I think it's pretty clear
> that, for the purposes of calculating authentication, we'll want
> to use one of the compatibility normalizations (KC or KD). I
> beleive that KD requires less processing, so I would tend to
> favor it over KC.

That was indeed a point I had not considered.

Regarding Normalization Form KC versus KD I think that one thing that 
should be considered before one of them is selected is that the IDNA RFCs 
(3454, 3490, 3491, 3492) are already using KC.

Implementationwise, I would prefer to use a single normalization form in 
the network related code. Of course, the actual code overhead depends on 
how much code is needed to implement the different normalizations, and 
whether or not other parts of the code also needs the other forms.


-- 
Sincerely,
Yngve N. Pettersen

********************************************************************
Senior Developer		             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
********************************************************************

Received on Monday, 1 December 2003 11:10:46 UTC