Re: non-ascii user name & password from Chris Newman on 1998-09-28 (ietf-http-wg@w3.org from October to December 1998)

From: Chris Newman <Chris.Newman@innosoft.com>
Date: Mon, 28 Sep 1998 19:19:26 +0100 (BST)
To: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
Cc: http-wg@hplb.hpl.hp.com
Message-Id: <Pine.SOL.3.95.980928104612.6858D-100000@elwood.innosoft.com>
On Fri, 25 Sep 1998, Roy T. Fielding wrote:
> Yes, they do. That doesn't change the definition of the protocol.  
> The username and password were defined as ISO-8859-1 when the 
> authentication fields were invented and deployed.  Except for the
> usual charset politics, that definition worked just fine.

In RFC 2068, username and password are defined as TEXT, which may be
either ISO-8859-1 or something encoded according to RFC 1522 (the old
version of RFC 2047).  I suspect it's quite clear that nothing other than
ISO-8859-1 is going to work at all reliably in this context.  Does anyone
actually implement RFC 2047 in this context?

RFC 2069 is a bit different, as the username appears in a quoted-string
which therefore forbids the use of RFC 2047.  So RFC 2069 requires
ISO-8859-1 for usernames.

Now this may work just fine for western Europe and America, but it is not
international, so it is broken.  When something is broken in a protocol,
it should be fixed.  Then one has to choose whether compatibility must be
retained.

I suspect that compatibiliy with RFC 2047 encoding in usernames and
passwords is not worth retaining, as it was probably never signficantly
deployed and even if it was, it probably doesn't interoperate.  RFC 2047
is fine for most headers, but it was never designed for something which
requires a canonical form.  I have no opinion on whether compatibility
with ISO 8859-1 compatibility should be retained in this context, as I'm
not aware of the deployment patterns of that use -- that's a judgement
call for this working group. 

If you want to make this international and retain ISO 8859-1
compatibility, then the right thing to do is use UTF-8 encoding, unless
the entire string is made up of 8859-1 characters in which case 8859-1
encoding is used instead.  Now since a draft standard can't reference
UTF-8, you'd want to leave the encoding for non 8859-1 characters
undefined for now, and define it in an extension, but it'd probably be
worth forbidding all encodings other than 8859-1 unless specified in a
standards track document -- that would reduce the problem. 

If you don't care about retaining compatibility with ISO 8859-1 use and
want to make it international, then declare it US-ASCII for now and write
an extension to make them UTF-8.

The username parameter in digest auth is stuck at ISO 8859-1 by reference,
but an encoding for UTF-8 could be added by an extension (e.g., RFC 2231
encoding).

> In email protocols, specifications that contrast with reality have
> traditionally been ignored by almost all developers and resulted in
> interoperability failures when some poor sap actually attempted to comply
> with the RFC.

I disagree.  Certainly the use of private agreement charsets is popular in
email as it was the only localization solution avaiable prior to MIME and
it was widely deployed.

> HTTP does not allow that.

HTTP has no more power to enforce the specification than email does.
I will admit that an interactive protocol is easier to extend and upgrade
than a store-and-forward protocol.

>  HTTP has a version number 
> whose minor number is supposed to change whenever compatible changes
> are introduced, and a major number that is supposed to change whenever
> incompatible changes are introduced.

A new port number provides equivalent functionality to a major version
number.  Feature announcement provides superior functionality to minor
version numbers.  SMTP has been extensively modified without the need for
version numbers.

> I have no problem with defining a new protocol in the HTTP family that
> cures the hundred-odd problems leftover from the installed base and
> eventually progresses on the standards track.  I have a huge problem
> with such a protocol masquerading as HTTP/1.x when we have carefully
> designed the protocol for forward compatibility.

If you want 100% compatibility with the interoperable portions of the
installed base, that's fine.  But where the spec doesn't interoperate, it
should be fixed.  I suspect it doesn't interoperate for non-8859-1
characters in usernames and passwords. 

> The problem is that
> the IETF standards-track process interferes with good protocol design
> by not allowing progress along delineated branches.

Quite the contrary.  Version numbers prevent the development of branches.
Feature announcement as ESMTP and IMAP use has been repeatedly successful
in allowing multiple branches to develop simultaneously.

> It is high time that the IETF started thinking in terms of protocol
> families

I see no problems in this area.  Extensions to standard protocols are
flourishing in the IETF.

> and planning for evolution rather than making standards
> decrees and hoping the installed base gets sucked into the void.

Sometimes it is better to evolve.  Sometimes it is better to start over
and create an incompatible version.  Sometimes the installed base sort of
works but doesn't really interoperate and is best ignored when creating a
fully interoperable solution.  Which choice is better is an engineering
decision which needs to be evaluated carefully.

		- Chris
Received on Monday, 28 September 1998 11:19:25 UTC