RE: non-ascii user name & password

> A careful re-reading of the digest spec shows that user-name is spec'd as
> quoted-string (not TEXT), and password is never interpreted by the message
> parser, just used to calculate the response from the challenge.

> So, we can use TEXT for the password; that leaves the question of encoding.

I think you're reading "parsed" too liberally. The problem is that
TEXT is not useful for anything that depends on the interpretation
of the string besides it being displayed to the user. That's because
the same user name can be represented in many different ways, each
of which is a legal encoding:

# The TEXT rule is only used for descriptive field contents and values
# that are not intended to be interpreted by the message parser. Words of
# *TEXT MAY contain characters from character sets other than ISO 8859-1
# [22] only when encoded according to the rules of RFC 2047 [14].

So a Japanese user might expect to type a username and have it be
recognized, but two browsers might encode the username
in Shift-JIS or EUC or UTF-8; the result, though, would be different
strings. A server wouldn't do well to try to match them all.

In addition, the spec selected ISO-8859-1 as the 'default' string
representation. It's built into the spec and I don't think we can
retroactively change it to UTF-8.  There are probably many European
users who have ISO-8859-1 user names who already use the fact that
username & password are assumed to be ISO-8859-1. So I don't think
we can do the 'restrict to ASCII and migrate to UTF8 later' path.

So I'm back to restricting 'user-id' to be US-ASCII, and noting, as
tactfully and apologetically as we can, that this does not actually
allow userful user *names* for users whose name cannot be typed in
ASCII.

Larry (do not reply to this email address)

Received on Thursday, 24 September 1998 09:54:07 UTC