- From: Yngve Nysaeter Pettersen <yngve@opera.com>
- Date: Mon, 01 Dec 2003 17:10:43 +0100
- To: Adam Roach <adam@dynamicsoft.com>, ietf-http-wg@w3.org
- Cc: Scott Lawrence <scott-http@skrb.org>
On Wed, 26 Nov 2003 15:02:41 -0600, Adam Roach <adam@dynamicsoft.com> wrote: > For example, let's consider a username like "Åke". If you simply > specify UTF-8 as the encoding, you can still run into problems. > If the client represents the initial character as U+00C5, but the > server has it stored as U+0041 U+030A (both valid unicode > representations of "Å"), then you'll end up hashing differently. > The same, of course, applies to passwords. > > Fortunately, Unicode also defines normalization techniques that > allow one to ensure a consisitant representation; see annex 15 > (http://www.unicode.org/reports/tr15/). I think it's pretty clear > that, for the purposes of calculating authentication, we'll want > to use one of the compatibility normalizations (KC or KD). I > beleive that KD requires less processing, so I would tend to > favor it over KC. That was indeed a point I had not considered. Regarding Normalization Form KC versus KD I think that one thing that should be considered before one of them is selected is that the IDNA RFCs (3454, 3490, 3491, 3492) are already using KC. Implementationwise, I would prefer to use a single normalization form in the network related code. Of course, the actual code overhead depends on how much code is needed to implement the different normalizations, and whether or not other parts of the code also needs the other forms. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 24 16 42 60 Fax: +47 24 16 40 01 ********************************************************************
Received on Monday, 1 December 2003 11:10:46 UTC