- From: Yngve N. Pettersen (Developer Opera Software ASA) <yngve@opera.com>
- Date: Sat, 29 Nov 2003 17:31:57 +0100
- To: Scott Lawrence <scott-http@skrb.org>, ietf-http-wg@w3.org
On Wed, 26 Nov 2003 15:23:17 -0500, Scott Lawrence <scott-http@skrb.org> wrote: > I don't think I understand your example. If my server gets a > user="foo" and 'foo' does not appear in my database of valid users, > then authentication has failed. I'm not a character set expert, so I don't have any Japanese or Chinese examples handy, but I know that Japanese systems are using several different character sets. But let us use an extreme (and unrealistic) example: Let's assume that the client is using US-ASCII as the default character set, while the server is using EBCDIC. The username "foo" and the associated password is entered on the console of the machine. This means that the username and password are represented to the server using EBCDIC character codes, not US-ASCII. When the client is creating the credentials it will be using US-ASCII as the character set, instead of EBCDIC. The binary representation (in C-style hex) of "foo" in US-ASCII is <0x66 0x6F 0x6F>, while it is <0x86 0x96 0x96> in EBCDIC. Unless the server explicitly converts the recieved username from US-ASCII to EBCDIC (or the other way for the EBCDIC version) before using it, the server will not be able to get a match, despite the fact that the user entered "foo" when registering and when autenticating. That was phase 1; now for phase 2: Replace "US-ASCII" with one of the Japanese character sets e.g. Shift-JIS, "EBCDIC" with one of the other japanese character sets, e.g. EUC-JP, use a Japanese username and repeat the above procedure. My point is that you cannot guarantee that all steps of the authentication process, including the registration process, on both the client and server side results in the *same* binary representation of a national character, unless the specification clearly specifies which binary representation is going to be used. And in an international environment like HTTP is used in, the best binary representation of a string of national characters is the 8 bit encoding of Unicode, UTF-8. > The username value is already covered by the existing rule for TEXT: AFAICT (from a quick look) Apache 2.0 is not able to parse a RFC 2047 encoded parameter (Oh, and BTW: the RFC 2047 encoding does not have a very good syntax for parameters, e.g. name==?a?Q?value?= , it is not without reason that it's been updated by RFC 2231). AFAIK nobody are using the RFC 2047 encoding, especially not for authentication. Feel free to correct me if I am wrong. Assuming that UTF-8 is not mandated for Basic username and password and Digest username, I would recommend that RFC 2231 encoding is recommended for the Digest username, instead of RFC 2047, as 2231 is better suited for encoding parameters, and that it is clearly stated in the RFC. However, the problem about which binary representation is used in calculations MUST also be addressed (should the encoded or the decoded version of the credentials be used, and should they be converted to a common character set, if possible?). Not mandating UTF-8 will just move the problem around. Come to think of it: Perhaps the *TEXT rule in RFC 2616 sec 2.2 should be updated to mandate UTF-8 instead of iso-8859-1? But that is probably too big a change to do at this time. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 24 16 42 60 Fax: +47 24 16 40 01 ********************************************************************
Received on Saturday, 29 November 2003 11:27:32 UTC