- From: Alexey Melnikov <mel@messagingdirect.com>
- Date: Tue, 15 Apr 2003 18:11:45 -0600
- To: yngve@opera.com
- CC: ietf-http-wg@w3.org
Yngve Nysaeter Pettersen wrote: > Hi, Hi, > My name is Yngve N. Pettersen, I am a developer at Opera Software ASA, the > company producing the Opera browser. One of my areas of responsibility is > our HTTP protocol support. > > Some time ago, while implementing Opera's support for international > character sets I discovered that RFC 2617 did not specify the character set > to be used when encoding the username and password arguments for Basic and > Digest authentication. > > Given that BCP 18/RFC 2277 strongly encouraged UTF-8 support in protocols, > and that it may be impossible to determine the server's preferred > characterset, among other reasons, I decided to use UTF-8 as the > characterset when encoding the username and password before generating the > authentication strings. > > Recently we received a report concerning problems with this way of > generating authentication strings (apparantly other clients does not > convert national characters in Western European languages, at least, I > don't know how they treat Asian languages), and while researching the > current state of the protocol, I noticed that the current errata does not > address this point. > > I would therefore like to suggest that an item specifying which character > set should be used when generating Basic and Digest authentication strings > is added to the errata. > > My suggestion is that UTF-8 is selected as the character set used to encode > the username and password values when creating the "user-pass" string (sec. > 2) and the "username-value" and "passwd" strings in sec. 3.2.2. It might > also be an idea to specify the same for other text attributes as well. > > As mentioned above BCP 18 indicates UTF-8 is the preferred charset for > protocols. > > Additionally, I believe it would be very difficult to create a foolproof > guessing method that would decide the charset based on such things as the > charset of the authentication challenge response body, toplevel domain of > the server, or the same from the referrer (if any), or the character set > used on the client's computer (which may not match what is used on the > server). As an example, the challenge may use a default message in English, > while passwords and documents are encoded in a Japanese character set. > > I think the best way of avoiding (any further) ambiguities is to specify a > single character set that MUST be used, and UTF-8 is the character set > recommended by BCP 18. Although I am not big expert on this, but here is some information for you. You are right, RFC 2617 doesn't specify any character set. RFC 2617 is a revision of RFC 2069 which predates RFC 2277. So I suspect that when the RFC 2069 was revised, nobody noticed this issue. RFC 2831 (Using Digest Authentication as a SASL Mechanism) which is based on RFC 2617 had to deal with this as well. RFC 2831 has a phrase "The directive is needed for backwards compatibility with HTTP Digest, which only supports ISO 8859-1." which suggests that ISO 8859-1 is the default for HTTP. RFC 2831 had to add a new "charset" directive and a complex rule to convert UTF-8 usernames/passwords [that can be fully expressed as ISO 8859-1] to ISO 8859-1. This is a mess :-(. So, although I tend to agree with your choice to use UTF-8, however it seems that the reality is a bit more complicated than that. Regards, Alexey Melnikov __________________________________________ R & D, ACI Worldwide/MessagingDirect Watford, UK Work Phone: +44 1923 81 2877 Home Page: http://orthanc.ab.ca/mel IETF standard related pages: http://orthanc.ab.ca/mel/devel/Links.html I speak for myself only, not for my employer. __________________________________________
Received on Tuesday, 15 April 2003 20:08:26 UTC