- From: Honza Bambas <hbambas@mozilla.com>
- Date: Mon, 03 May 2010 21:46:04 +0000
- To: ietf-http-wg@w3.org
- Message-ID: <4BDF43F1.7070708@mozilla.com>
There has been observed that many server appliances allow setup of user names and passwords with characters that cannot be represented with ISO-8859-1. Client implementations then have problems to communicate with such servers properly while obeying RFCs, because of missing specification of character encoding of a user name and a password in both 'basic' and 'digest' authentication scheme. Specially building of the Authorization header and its username= directive value and building of A1 string. As for the username= directive value: it is by definition a 'quoted-string' that is unable to carry any information about its character encoding. I have not found any explicit information in RFC 2617 about a required character encoding for it. RFC 2047 encoding cannot be used because "an 'encoded-word' MUST NOT appear within a 'quoted-string'" per RFC 2047 and on the other hand, per RFC 2616, "words of *TEXT (which 'quoted-string' consist of) MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047". A dead end. As for A1 value: it's not said anywhere in what byte representation the user name, password and realm should be read when establishing the A1 octet array. The same problem applies to basic authentication where a base64 string is built to carry the username:password pair, but it is not said anywhere in what character encoding or generally an encoding the source for base64 has to be. For example of a violation: Apache configuration utility program for configuring digest authentication database is taking the user name and the password directly as an argument from a terminal, in, often but not generally, UTF-8 encoding, pushing it to a hash function directly without any further translation. The client side then should build the A1 string on it's side from UTF-8 encoded octet arrays. The authentication mod seems to take the username= directive value, used to create the A1 string, "as is", in a byte representation sent by the client. But, there is no way for the client to know, what encoding should be used when generating the headers. My question is: should we disallow acceptance of a user name or password input in encoding different from ISO-8859-1 on the client side (independently on a server being setup for it, in any way) or should there be defined an extension to RFC 2617 allowing communication of the encoding between the client and the server? For reference there are Mozilla platform bugs https://bugzilla.mozilla.org/show_bug.cgi?id=546330 and https://bugzilla.mozilla.org/show_bug.cgi?id=41489. Sincerely, Honza Bambas
Received on Sunday, 16 May 2010 08:15:11 UTC