RFC 2617: Which character should be used?

Hi,

My name is Yngve N. Pettersen, I am a developer at Opera Software ASA, the 
company producing the Opera browser. One of my areas of responsibility is 
our HTTP protocol support.

Some time ago, while implementing Opera's support for international 
character sets I discovered that RFC 2617 did not specify the character set 
to be used when encoding the username and password arguments for Basic and 
Digest authentication.

Given that BCP 18/RFC 2277 strongly encouraged UTF-8 support in protocols, 
and that it may be impossible to determine the server's preferred 
characterset, among other reasons, I decided to use UTF-8 as the 
characterset when encoding the username and password before generating the 
authentication strings.

Recently we received a report concerning problems with this way of 
generating authentication strings (apparantly other clients does not 
convert national characters in Western European languages, at least, I 
don't know how they treat Asian languages), and while researching the 
current state of the protocol, I noticed that the current errata does not 
address this point.

I would therefore like to suggest that an item specifying which character 
set should be used when generating Basic and Digest authentication strings 
is added to the errata.

My suggestion is that UTF-8 is selected as the character set used to encode 
the username and password values when creating the "user-pass" string (sec. 
2) and the "username-value" and "passwd" strings in sec. 3.2.2. It might 
also be an idea to specify the same for other text attributes as well.

As mentioned above BCP 18 indicates UTF-8 is the preferred charset for 
protocols.

Additionally, I believe it would be very difficult to create a foolproof 
guessing method that would decide the charset based on such things as the 
charset of the authentication challenge response body, toplevel domain of 
the server, or the same from the referrer (if any), or the character set 
used on the client's computer (which may not match what is used on the 
server). As an example, the challenge may use a default message in English, 
while passwords and documents are encoded in a Japanese character set.

I think the best way of avoiding (any further) ambiguities is to specify a 
single character set that MUST be used, and UTF-8 is the character set 
recommended by BCP 18.


-- 
Sincerely,
Yngve N. Pettersen

********************************************************************
Senior Developer		             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 51              Fax:    +47 24 16 40 01
********************************************************************

Received on Tuesday, 15 April 2003 17:21:55 UTC