- From: Larry Masinter <LMM@acm.org>
- Date: Wed, 16 Apr 2003 21:56:13 -0700
- To: "'Alexey Melnikov'" <mel@messagingdirect.com>, <yngve@opera.com>
- Cc: <ietf-http-wg@w3.org>
The topic was discussed in the HTTP working group in September 1998. http://lists.w3.org/Archives/Public/ietf-http-wg-old/1998SepDec/0040.htm l and the end of the dialog was: How about we restrict user-ids and typed-in passwords to US-ASCII for now, declare encoding of non-ASCII characters in those fields undefined but explicitly forbid use of localized charsets (e.g., ISO-8859-1)? Then we can amend it to use UTF-8 later with a spec that progresses separately on the standards track. However, this apparently didn't make it into the spec. I think that as far as 'errata', what's missing is a pointer to the discussion and this particular 'resolution'. I admit that it's not a great resolution of the issue (we basically just avoided it), but that's at least a baseline: HTTP isn't as functional as one would like in this area. I'm less sanguine about mandating new behavior at this point, for the same reasons given in 1998. So I'm not sure there is a "fix", unfortunately. Clients can do things that servers won't understand, and servers can expect clients to do things that they can't. Non-ASCII user names will be unreliable until new behavior is proposed and accepted by the community. The notion (in 1998) is that someone would write a spec that would progress independently on standards track (or at least thru Proposed and Draft) so that we can update HTTP to full standard some day. Larry -- http://larry.masinter.net -----Original Message----- From: ietf-http-wg-request@w3.org [mailto:ietf-http-wg-request@w3.org] On Behalf Of Alexey Melnikov Sent: Tuesday, April 15, 2003 5:12 PM To: yngve@opera.com Cc: ietf-http-wg@w3.org Subject: Re: RFC 2617: Which character should be used? Yngve Nysaeter Pettersen wrote: > Hi, Hi, > My name is Yngve N. Pettersen, I am a developer at Opera Software ASA, the > company producing the Opera browser. One of my areas of responsibility is > our HTTP protocol support. > > Some time ago, while implementing Opera's support for international > character sets I discovered that RFC 2617 did not specify the character set > to be used when encoding the username and password arguments for Basic and > Digest authentication. > > Given that BCP 18/RFC 2277 strongly encouraged UTF-8 support in protocols, > and that it may be impossible to determine the server's preferred > characterset, among other reasons, I decided to use UTF-8 as the > characterset when encoding the username and password before generating the > authentication strings. > > Recently we received a report concerning problems with this way of > generating authentication strings (apparantly other clients does not > convert national characters in Western European languages, at least, I > don't know how they treat Asian languages), and while researching the > current state of the protocol, I noticed that the current errata does not > address this point. > > I would therefore like to suggest that an item specifying which character > set should be used when generating Basic and Digest authentication strings > is added to the errata. > > My suggestion is that UTF-8 is selected as the character set used to encode > the username and password values when creating the "user-pass" string (sec. > 2) and the "username-value" and "passwd" strings in sec. 3.2.2. It might > also be an idea to specify the same for other text attributes as well. > > As mentioned above BCP 18 indicates UTF-8 is the preferred charset for > protocols. > > Additionally, I believe it would be very difficult to create a foolproof > guessing method that would decide the charset based on such things as the > charset of the authentication challenge response body, toplevel domain of > the server, or the same from the referrer (if any), or the character set > used on the client's computer (which may not match what is used on the > server). As an example, the challenge may use a default message in English, > while passwords and documents are encoded in a Japanese character set. > > I think the best way of avoiding (any further) ambiguities is to specify a > single character set that MUST be used, and UTF-8 is the character set > recommended by BCP 18. Although I am not big expert on this, but here is some information for you. You are right, RFC 2617 doesn't specify any character set. RFC 2617 is a revision of RFC 2069 which predates RFC 2277. So I suspect that when the RFC 2069 was revised, nobody noticed this issue. RFC 2831 (Using Digest Authentication as a SASL Mechanism) which is based on RFC 2617 had to deal with this as well. RFC 2831 has a phrase "The directive is needed for backwards compatibility with HTTP Digest, which only supports ISO 8859-1." which suggests that ISO 8859-1 is the default for HTTP. RFC 2831 had to add a new "charset" directive and a complex rule to convert UTF-8 usernames/passwords [that can be fully expressed as ISO 8859-1] to ISO 8859-1. This is a mess :-(. So, although I tend to agree with your choice to use UTF-8, however it seems that the reality is a bit more complicated than that. Regards, Alexey Melnikov __________________________________________ R & D, ACI Worldwide/MessagingDirect Watford, UK Work Phone: +44 1923 81 2877 Home Page: http://orthanc.ab.ca/mel IETF standard related pages: http://orthanc.ab.ca/mel/devel/Links.html I speak for myself only, not for my employer. __________________________________________
Received on Thursday, 17 April 2003 00:56:28 UTC