- From: Yngve N. Pettersen (Developer Opera Software ASA) <yngve@opera.com>
- Date: Mon, 21 Apr 2003 05:05:51 +0200
- To: ietf-http-wg@w3.org
Hi Scott, [note: I corrected the subject] On 16 Apr 2003 08:20:15 -0400, Scott Lawrence <scott-http@skrb.org> wrote: > > Yngve Nysaeter Pettersen <yngve@opera.com> writes: > >> My suggestion is that UTF-8 is selected as the character set used to >> encode >> the username and password values when creating the "user-pass" string >> (sec. 2) and the "username-value" and "passwd" strings in sec. 3.2.2. It >> might also be an idea to specify the same for other text attributes as >> well. > > I just took a look at the spec to try to come up with specific > language for this. > > Section 3.2.2.2 A1 add: > > The passwd value used should be encoded using UTF-8. > > I don't think it's an issue for the user-pass string or > username-value, since these are just literals that are passed in the > clear to the server anyway. Can't the server just use them as is? I'm afraid not. Remember, the server must not just be able to perform calculations using the password, it must also be able to look up the appropriate username entry in its database. If the client and the server are using different character sets in any phase of creating, updating and referencing this database there will be no match. However, I've just noticed that RFC 2616 actually do comment on this in section 2.2, and requires RFC 2047 encoding for any TEXT not using iso- 8859-1 encoding. The question then becomes: Should the errata of RFC 2617 override that requirement and mandate UTF-8, or should an extension to the current header methods be formulated, or should completely new authentication methods be formulated that will handle UTF-8 usernames/passwords? One way of overriding that section would be to change the defintions of usernames and passwords to using OCTET (minus control characters and special characters) instead of TEXT. Something similar was proposed in the thread referenced by Larry Masinter. Personally I'd prefer to override RFC 2616 sec 2.2 for RFC 2617 credentials, as I think BCP 18 should be the guideline. If that is not possible I'd like to avoid RFC 2047 syntax (E.g: Which charset should be used for the password in digest authentication, and how do we tell the server?). I can think of several alternatives if UTF-8 cannot be made mandatory: Alternative 1: Specify that if all characters in both the username and password is in the iso-8859-1 charset, then iso-8859-1 can be used, in all other cases utf-8 is used. This will probably lead to some username/password collisions; I do not know how serious this will be. Alternative 2: Extend RFC 2617 with two new methods (e.g.) "Basic8" and "Digest8", with mostly the same syntax as the present "Basic" and "Digest" methods, but with the requirement that username and password is encoded in UTF-8. However, this will require the server to send one extra header for each method it supports, and would probably need to be specified separately as an RFC. Alternative 3: Extend the syntax of "Basic" and "Digest" authentication headers with a "utf-8" parameter which, when included in the server's challenge, indicate that the server understands UTF-8 encoded usernames and passwords. When a UTF-8 enable client sees this parameter it can then encode the username and password in UTF-8 and add a utf-8 parameter to the authorization header it sends to the server to indicate that the authorization is in UTF-8. Examples: WWW-Authenticate: Basic realm="realm", utf-8 WWW-Authenticate: Digest realm="realm", <digest parameters>, utf-8 Authorization: Basic <basic-credentials>, utf-8 Authorization: Digest <digest-response>, utf-8 By using a utf-8 parameter instead of a charset parameter, it's possible to limit the charset permutatitons the client and the server have to be able to handle. However, given that the A1 value for Digest authentication may be calculated in advance and by a thirdparty server, that means that two A1 values must be prepared, and distributed, when non-US-ASCII usernames/passwords are used (Come to think of it, this will also be the case if utf-8 is mandated, at least in a transition phase). Personally, as mentioned, I prefer making utf-8 mandatory, but if alternative specifications are needed I think alternatives 1 and 3 are the most acceptable of the alternatives above. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 24 16 42 51 Fax: +47 24 16 40 01 ********************************************************************
Received on Sunday, 20 April 2003 23:03:52 UTC