Re: RFC 2617 Authentication and character sets revisited from Scott Lawrence on 2003-11-26 (ietf-http-wg@w3.org from October to December 2003)

From: Scott Lawrence <scott-http@skrb.org>
Date: Tue, 25 Nov 2003 22:31:34 -0500
To: "Yngve N. Pettersen (Developer Opera Software ASA)" <yngve@opera.com>
Cc: ietf-http-wg@w3.org
Message-ID: <ud6bfhkkp.fsf@skrb.org>

"Yngve N. Pettersen (Developer Opera Software ASA)" <yngve@opera.com> writes:

> Background:
>
> RFC 2617 does not mention which character set should be used when
> generating the authentication credentials. This has led to a situation
> where (at the very least) various clients have made different design
> choices, which have resulted in some interoperability problems.
>
> When I implemented Unicode support for authentication in Opera I chose
> UTF-8 as the encoding for the credential inputdata, based, among other
> reasons, on the recommendations of BCP 18/RFC 2277. We have received a
> couple of reports about problems caused by this.

There's an important distinction here - the character set is not the
same as the encoding.  This mail is in the character set 'iso-8859-1';
it specifies a particular set of glyphs.  The encoding is how you
represent that character set with bits.

>  From Paul Leach (my summary+extra info)
> --------
> Basic Authentication's username and password attributes are defined as
> "*TEXT", Digest Authentication's username parameter is an qouted string
> (essentially *TEXT) and passwd has no real definition, but probably *TEXT
> or *OCTET.
>
> RFC 2616 does say that if a *TEXT word contains non-iso-8859-1 characters
> they should be represented using the RFC 2047 rules (e.g =?charset?Q?text?=
> ).

So 2617 already has a specified (if clunky and unevenly supported)
solution for Basic.  I don't see sufficient reason to change it, since
the worst thing about 2047 rules is that they are not human-friendly,
but this is not for humans anyway.

But 2617 is mute on the encoding of the inputs to the A1 hash values,
and if the same characters are hashed using different encodings in the
client and the server, then the hash (very very probably) will not
match, so I believe a case can be made that Digest is underspecified.

> What can be done?
>
> Shortrange: At the very least I think that the updated RFC 2617 draft
> should address the issue.

Agreed.

> Personally, I would prefer a permanent solution (one character set, or a
> negotiation method), but given the current implementation of clients and
> servers I suspect we may have to make do with something that effectively
> says "The client and server must be configured to use the same character
> set. How the configured character set is agreed upon is not defined by this
> specification".

I think that the character set isn't important - only the encoding;
they both already (presumably) share the secret, they just have to
agree on a common representation of it.

> Problem: How to solve backwards compatibility?
>
> 2) Define new internationalized authentication methods, at least for
>                digest.
>
> Personally, of course, I'd prefer that UTF-8 is endorsed as the character
> set.

Actually, I think this choice would be backward-compatible for those
cases that have been working, and incompatible only with those that
previously would have had interoperability problems anyway (passwords
that had values outside the ASCII range).

-- 
Scott Lawrence        
  http://skrb.org/scott/

Received on Tuesday, 25 November 2003 22:29:39 UTC