W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2003

RE: RFC 2617 Authentication and character sets revisited

From: Adam Roach <adam@dynamicsoft.com>
Date: Wed, 26 Nov 2003 15:02:41 -0600
Message-ID: <9BF66EBF6BEFD942915B4D4D45C051F3E8666E@dyn-tx-exch-001.dynamicsoft.com>
To: "'yngve@opera.com'" <yngve@opera.com>, ietf-http-wg@w3.org
Cc: Scott Lawrence <scott-http@skrb.org>

Yngve Nysaeter Pettersen [mailto:yngve@opera.com] wrote:

> I think a clear specification is needed, and I also think we need to 
> define the input values of both authentications methods such that the 
> process is unambiguous. That means that either the client must be
> able to tell the server which character set and encoding it is using 
> (RFC 2047 or a charset attribute), or the character set and encoding
> have to be fixed by the protocol.

In this case, Unicode is the character set, and UTF-8 is the encoding.

But your earlier comments reminded me of something: it can be
more complicated than that.

For example, let's consider a username like "┼ke". If you simply
specify UTF-8 as the encoding, you can still run into problems.
If the client represents the initial character as U+00C5, but the
server has it stored as U+0041 U+030A (both valid unicode
representations of "┼"), then you'll end up hashing differently.
The same, of course, applies to passwords.

Fortunately, Unicode also defines normalization techniques that
allow one to ensure a consisitant representation; see annex 15
(http://www.unicode.org/reports/tr15/). I think it's pretty clear
that, for the purposes of calculating authentication, we'll want
to use one of the compatibility normalizations (KC or KD). I
beleive that KD requires less processing, so I would tend to
favor it over KC.

So, in the spirit of sending text:

   The passwd value SHOULD be normalized according to Unicode
   Normalization Form KD [ref], and encoded using UTF-8 [ref]
   for input to the hash. (Note that characters in the range
   of U+0000 to U+007F are left unaffected by Unicode
   normalization.)

Presumably, the same text (with a tweak or two) can be used to
specify username handling.

/a
Received on Monday, 1 December 2003 09:34:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:49:25 GMT