Missing specification in RFC 2617, cannot use a user name nor a password in encoding different from ISO-8859-1 from Honza Bambas on 2010-05-03 (ietf-http-wg@w3.org from April to June 2010)

From: Honza Bambas <hbambas@mozilla.com>
Date: Mon, 03 May 2010 21:46:04 +0000
To: ietf-http-wg@w3.org
Message-ID: <4BDF43F1.7070708@mozilla.com>

There has been observed that many server appliances allow setup of user 
names and passwords with characters that cannot be represented with 
ISO-8859-1.  Client implementations then have problems to communicate 
with such servers properly while obeying RFCs, because of missing 
specification of character encoding of a user name and a password in 
both 'basic' and 'digest' authentication scheme.  Specially building of 
the Authorization header and its username= directive value and building 
of A1 string.

As for the username= directive value: it is by definition a 
'quoted-string' that is unable to carry any information about its 
character encoding.  I have not found any explicit information in RFC 
2617 about a required character encoding for it.  RFC 2047 encoding 
cannot be used because "an 'encoded-word' MUST NOT appear within a 
'quoted-string'" per RFC 2047 and on the other hand, per RFC 2616, 
"words of *TEXT (which 'quoted-string' consist of) MAY contain 
characters from character sets other than ISO-8859-1 only when encoded 
according to the rules of RFC 2047".  A dead end.

As for A1 value: it's not said anywhere in what byte representation the 
user name, password and realm should be read when establishing the A1 
octet array.  The same problem applies to basic authentication where a 
base64 string is built to carry the username:password pair, but it is 
not said anywhere in what character encoding or generally an encoding 
the source for base64 has to be.

For example of a violation: Apache configuration utility program for 
configuring digest authentication database is taking the user name and 
the password directly as an argument from a terminal, in, often but not 
generally, UTF-8 encoding, pushing it to a hash function directly 
without any further translation.  The client side then should build the 
A1 string on it's side from UTF-8 encoded octet arrays.  The 
authentication mod seems to take the username= directive value, used to 
create the A1 string, "as is", in a byte representation sent by the 
client.  But, there is no way for the client to know, what encoding 
should be used when generating the headers.


My question is: should we disallow acceptance of a user name or password 
input in encoding different from ISO-8859-1 on the client side 
(independently on a server being setup for it, in any way) or should 
there be defined an extension to RFC 2617 allowing communication of the 
encoding between the client and the server?

For reference there are Mozilla platform bugs 
https://bugzilla.mozilla.org/show_bug.cgi?id=546330 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=41489.

Sincerely,
Honza Bambas

Received on Sunday, 16 May 2010 08:15:11 UTC