- From: Adam Roach <adam@dynamicsoft.com>
- Date: Wed, 26 Nov 2003 15:02:41 -0600
- To: "'yngve@opera.com'" <yngve@opera.com>, ietf-http-wg@w3.org
- Cc: Scott Lawrence <scott-http@skrb.org>
Yngve Nysaeter Pettersen [mailto:yngve@opera.com] wrote: > I think a clear specification is needed, and I also think we need to > define the input values of both authentications methods such that the > process is unambiguous. That means that either the client must be > able to tell the server which character set and encoding it is using > (RFC 2047 or a charset attribute), or the character set and encoding > have to be fixed by the protocol. In this case, Unicode is the character set, and UTF-8 is the encoding. But your earlier comments reminded me of something: it can be more complicated than that. For example, let's consider a username like "Åke". If you simply specify UTF-8 as the encoding, you can still run into problems. If the client represents the initial character as U+00C5, but the server has it stored as U+0041 U+030A (both valid unicode representations of "Å"), then you'll end up hashing differently. The same, of course, applies to passwords. Fortunately, Unicode also defines normalization techniques that allow one to ensure a consisitant representation; see annex 15 (http://www.unicode.org/reports/tr15/). I think it's pretty clear that, for the purposes of calculating authentication, we'll want to use one of the compatibility normalizations (KC or KD). I beleive that KD requires less processing, so I would tend to favor it over KC. So, in the spirit of sending text: The passwd value SHOULD be normalized according to Unicode Normalization Form KD [ref], and encoded using UTF-8 [ref] for input to the hash. (Note that characters in the range of U+0000 to U+007F are left unaffected by Unicode normalization.) Presumably, the same text (with a tweak or two) can be used to specify username handling. /a
Received on Monday, 1 December 2003 09:34:40 UTC