Re: [Ietf-http-auth] Updating RFC 2617 (HTTP Digest) to use UTF-8

On Tue, 26 Sep 2006 00:08:11 +0200, Bjoern Hoehrmann <derhoermi@gmx.net>  
wrote:

> * Julian Reschke wrote:
>> Speaking of which, where does HTML come into play here? We're talking
>> about HTTP authentication à la RFC2617, not HTML forms based login.
>
> For HTML form submissions, unless the author indicated something else,
> web browsers tend to use the character encoding of the document that in-
> cludes the form to encode the characters; they could apply the same
> logic to the submission of credentials when there is such a document
> (e.g., the user clicked a link to a HTTP Auth protected page, the page
> with the link could then be used to determine some encoding). Based on
> my limited testing, I found this to be not the case.

In the HTML case the information is directly available, as part of the  
form's own environment, which it would not be when the authentication  
credentials are being processed.

Which characterset/encoding should the application choose when that  
information is not available?

E.g. The information would not be available if the user goes directly to  
the site, by entering the URL directly. Similar situation could arise when  
the URL is loaded in another tab than the originating document, and  
obtaining the information would also not be entirely straight forward even  
if it is opened in the same tab. Or what about proxy authentication?

Also: What if the original page is using a different characterset/encoding  
than used by the server requesting authentication? E.g. What if a Russian  
language page directs you to an authenticated Japanese site? And AFAIK  
several languages actually have multiple encodings.

Also, characterset/encoding information may not be available in the HTTP  
header either.

It is not possible to define a heuristic that will fit all scenarios. The  
best approach is to define a common characterset/encoding that will be  
used by all compliant servers.

As RFC 2617 was not able to assist, the only guidance I had when I chose  
the I18N policy for Opera's RFC 2617 support, was RFC 2277/BCP 18, which I  
interprete to say that protocols should use UTF-8 unless they specify  
otherwise either in the specification (i.e. the RFC) or in a specific  
field of the protocol (in this case, that would mean an attribute in the  
WWW-Authenticate header). The problem is that work on the RFC 2617  
protocol probably started before RFC 2217 was finished.

Given that the current system is broken anyway, since client and server  
have to agree out-of-band on which characterset/encoding to use, it is in  
my opinion best to define a proper solution, which IMO means UTF-8,  
instead of trying to patch up the broken system . (And remember: Even a  
patch of the current system would have to be deployed in new clients and  
servers).


-- 
Sincerely,
Yngve N. Pettersen
 
********************************************************************
Senior Developer                     Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
********************************************************************

Received on Monday, 25 September 2006 23:21:27 UTC