RFC 2617 Authentication and character sets revisited


Given Scott Lawrence's mention of upcoming drafts updating RFC 2616 and 
2617 I thought I should raise this issue again for consideration. I 
previuously raised it in April 2003.


RFC 2617 does not mention which character set should be used when 
generating the authentication credentials. This has led to a situation 
where (at the very least) various clients have made different design 
choices, which have resulted in some interoperability problems.

When I implemented Unicode support for authentication in Opera I chose 
UTF-8 as the encoding for the credential inputdata, based, among other 
reasons, on the recommendations of BCP 18/RFC 2277. We have received a 
couple of reports about problems caused by this.

New information:

Limited testing indicates that MSIE uses the character set selected in the 
View->Encoding menu when encoding Basic Authentication credentials. As the 
intention was only to get an idea about what kind of character sets where 
used, Mozilla/Netscape and other clients were not tested.

If correct, this means that MSIE users can authenticate correctly as long 
as the server is using the same character set as they do, but as soon as 
the server is using a different character set they have to manually change 
the Encoding before being able to continue. And AFAIK, (disclaimer: I am 
not a character set expert) many of the Asian, Eastern European and Middle 
East nationalities have more than one character set to choose from, which 
means that even within a single nation you could run into problems.

Other items:

 From Paul Leach (my summary+extra info)
Basic Authentication's username and password attributes are defined as 
"*TEXT", Digest Authentication's username parameter is an qouted string 
(essentially *TEXT) and passwd has no real definition, but probably *TEXT 
or *OCTET.

RFC 2616 does say that if a *TEXT word contains non-iso-8859-1 characters 
they should be represented using the RFC 2047 rules (e.g 
=?charset?Q?text?= ).

I have, however, never seen the RFC 2047 QP syntax used by a HTTP client 
or server, and we could encounter problems regarding the Digest processing 
of passwords (Which charset is used? The same as in the username? Which 
encoding? Q or B? And what about precalculated A1 values?)

 From Alexey Melnikov (my summary)
RFC 2831 (Digest as SASL) introduced a charset attribute and complex rules 
to handle the situation.

What can be done?

Shortrange: At the very least I think that the updated RFC 2617 draft 
should address the issue.

Personally, I would prefer a permanent solution (one character set, or a 
negotiation method), but given the current implementation of clients and 
servers I suspect we may have to make do with something that effectively 
says "The client and server must be configured to use the same character 
set. How the configured character set is agreed upon is not defined by 
this specification".

Longrange: A permanent solution (beside leaving the sitation as it is) can 
take several forms:

1) Updating the current methods by doing either of these:

   A) Define a standard character set to be used.

   B) Define a negotiation method, either client only, or client select 
 from server's list. E.g. The client adds a charset attribute in the 
challenge response.

Problem: How to solve backwards compatibility?

2) Define new internationalized authentication methods, at least for 

All of these will require that both servers and clients are updated with 
new functionality, which will cause transition problems.

Personally, of course, I'd prefer that UTF-8 is endorsed as the character 

Yngve N. Pettersen
Senior Developer                     Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/

Received on Tuesday, 25 November 2003 21:22:53 UTC