Re: RFC 2617 Authentication and character sets revisited

On Wed, 26 Nov 2003 13:57:15 -0500, Scott Lawrence <scott-http@skrb.org> 
wrote:

>
> Yngve Nysaeter Pettersen <yngve@opera.com> writes:
>
>> The server and client must *also* agree about the binary representation
>> (character set and encoding) of the username, as the username is used 
>> as an
>> index into the password database.
>
> The difference is that the username is also passed in clear, so the
> encoding used on the wire for that attribute can be used (as is the
> case for all the other inputs to the hash).

But will the binary representation of the username in the authentication 
header match the binary representation in the server's database?

The configuration procedure can be done through webforms, which may or may 
not have character sets defined, or through a Telnet/SSH connection to the 
server, just to mention two possibilities.

And there is no way to tell the client which character set and encodings 
were used during the registration process.

An example: The Norwegian spelling of my middle name contains the letter 
"æ" (ae, 0xE6). In iso-8859-1 and iso-8859-15 the binary representation is 
the same, but in UTF-8 it has a different representation and will not 
exist in most other character sets. Depending on the servers' registration 
procedures, I may be able to register a username containing that 
character, even if the server does not know the character set, but unless 
the registration process and the actual login somehow results in the same 
binary representation I will probably not be able to log in.

I might luck out with my special character, but I suspect that situation 
will be much worse in languages like Japanese and Chinese that appears to 
be using several coexisiting character sets. It is not necessariely the 
case that two different computers are using the same default character set.

> Perhaps we need a sentence to make that explicit?

I think a clear specification is needed, and I also think we need to 
define the input values of both authentications methods such that the 
process is unambiguous. That means that either the client must be able to 
tell the server which character set and encoding it is using (RFC 2047 or 
a charset attribute), or the character set and encoding have to be fixed 
by the protocol.


-- 
Sincerely,
Yngve N. Pettersen

********************************************************************
Senior Developer		             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
********************************************************************

Received on Wednesday, 26 November 2003 15:05:26 UTC