RE: non-ascii user name & password from Chris Newman on 1998-09-22 (ietf-http-wg@w3.org from October to December 1998)

From: Chris Newman <Chris.Newman@innosoft.com>
Date: Tue, 22 Sep 1998 01:20:28 +0100 (BST)
To: Paul Leach <paulle@microsoft.com>
Cc: http-wg@hplb.hpl.hp.com
Message-Id: <Pine.SOL.3.95.980921161313.14158P-100000@elwood.innosoft.com>

On Mon, 21 Sep 1998, Paul Leach wrote:
> Or, I could say:
> ------------------
> 	passwd   = *OCTET
> 
> It is the responsibility of the client implementation to make sure that the
> user can generate any octet string when providing the password. A protocol
> for setting and changing passwords (which is beyond the scope of this
> document) must specify how what the user provides maps to the actual octet
> string "on the wire".
> -------------------

That's the right formal syntax (although a recommendation to exclude NUL
might be wise).  However, most passwords are typed as characters, so an
interoperable spec (at least to get multi-client interoperability) has to
say how typed characters in a password are encoded.  The difference is
visible on the wire even though the server should not care.

My suggestion:

  When a password is typed by a user, the characters are encoded in
  US-ASCII.  Encoding of non-US-ASCII characters is not specified at this
  time, but use of localized character sets such as ISO-8859-1 for this
  purpose is forbidden.  Clients are encouraged to provide a facility for
  entry of uninterpreted binary passwords.

I think this is the best interoperability we can get at this time without 
creating an implementation burden.

The user name is TEXT (in a quoted string) which is trickier to deal with
since it is in the base protocol which by context defaults to ISO-8859-1.
The problem is that no other protocol made the mistake of using ISO-8859-1
for user names, and they are likely to be amended to use UTF-8.  Since
HTTP digest includes the username in the one-way-function used to store
the password on the server, this would mean that non-ASCII usernames
wouldn't be able to share authentication backend databases between HTTP
digest and other protocols.  I think that would be a serious mistake. 

I'd say that any server which permits non-ASCII characters in the username
field SHOULD convert them to UTF-8 prior to including them in the 
one-way-function.  The conversion from ISO-8859-1 to UTF-8 is trivial, so
that's not a significant burden in exchange for multi-protocol
interoperability of digest authentication.

RFC 2047 encoding is forbidden in a quoted-string, so that can't be used
on the user name.  As it stands, user names in HTTP are Euro-centric
unless RFC 2231 encoding were to be permitted at some future date.

		- Chris

Received on Thursday, 24 September 1998 09:55:57 UTC