RE: [www-international] <none> from by way of Martin Duerst on 2001-10-26 (www-international@w3.org from October to December 2001)

From: by way of Martin Duerst <duerst@w3.org>
Date: Fri, 26 Oct 2001 10:56:57 +0900
To: www-international@w3.org
Message-Id: <4.2.0.58.J.20011026105652.05b50be0@localhost>
We have a multi-lingual application supported in 13  languages. For this 
problem we came up with 3 different
authentication schemes. It depends on the deployer to choose a scheme

1. Basic authentication (HTTP) here we do not support asian users
2. Forms based authentication (using JDBC and JCE) - Supports UTF-8 
so  usernames and passwords can be in different languages.
3. LDAP authentication.

-Aruna




Yoshito_Umaoka@lotus.co.jp
Sent by: www-international-request@w3.org

10/25/2001 09:17 AM

      To:       www-international@w3.org
      cc:
      Subject:       RE: [www-international] <none>


 >I hope this is an appropriate question for www-international. It concerns
 >the behavior of HHTP authentication in a multilingual environment.

I struggled with the same issue before.

The HTTP basic authentication is defined in RFC2617.  The definitions for
"userid" and
"password" are below.

 > credentials = "Basic" basic-credentials
 > basic-credentials = base64-user-pass
 > base64-user-pass  = <base64 [4] encoding of user-pass, except not
limited to 76 char/line>
 > user-pass   = userid ":" password
 > userid      = *<TEXT excluding ":">
 > password    = *TEXT

Based on the definitions, you can use TEXT for password and TEXT excluding
":" for
"userid" and any TEXT for "password".  RFC2617 inherits rules from RFC2616
and
TEXT rule in RFC2616 is defined like below.

 > OCTET          = <any 8-bit sequence of data>
 > CHAR           = <any US-ASCII character (octets 0 - 127)>
 > UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
 > LOALPHA        = <any US-ASCII lowercase letter "a".."z">
 > ALPHA          = UPALPHA | LOALPHA
 > DIGIT          = <any US-ASCII digit "0".."9">
 > CTL            = <any US-ASCII control character
                 (octets 0 - 31) and DEL (127)>
 > CR             = <US-ASCII CR, carriage return (13)>
 > LF             = <US-ASCII LF, linefeed (10)>
 > SP             = <US-ASCII SP, space (32)>
 > HT             = <US-ASCII HT, horizontal-tab (9)>
 > <">            = <US-ASCII double-quote mark (34)>

 > HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
 > protocol elements except the entity-body (see appendix 19.3 for
 > tolerant applications). The end-of-line marker within an entity-body
 > is defined by its associated media type, as described in section 3.7.
 >
 >     CRLF           = CR LF
 >
 > HTTP/1.1 header field values can be folded onto multiple lines if the
 > continuation line begins with a space or horizontal tab. All linear
 > white space, including folding, has the same semantics as SP. A
 > recipient MAY replace any linear white space with a single SP before
 > interpreting the field value or forwarding the message downstream.
 >
 >      LWS            = [CRLF] 1*( SP | HT )
 >
 >  The TEXT rule is only used for descriptive field contents and values
 >  that are not intended to be interpreted by the message parser. Words
 >  of *TEXT MAY contain characters from character sets other than ISO-
 >  8859-1 [22] only when encoded according to the rules of RFC 2047
 >  [14].
 >
 >     TEXT           = <any OCTET except CTLs,
 >                       but including LWS>

So the standard says -

1. You can use any characters defined in ISO-8859-1 for "userid" and
"password"
2. If you want to use any other charsets other than ISO-8859-1 for
"userid" and
  "password", you must encode the string data according to the rules
defined in
  RFC2047 (MIME encoded word)

However, no browser support the standard as far as I know.

 >It seems that the authentication data sent from the browser in response
to a
 >server request is supplied in the platform codepage of the system that
the
 >browser is running on.
 >
 >In other words, on Japanese windows, the username comes back in cp932, on
a
 >french windows machine, the username comes back in cp1252, on a Solaris
 >machine, it comes back in whatever the platform encoding is set to..

Yes.  I found the same things.  I was also thinking about a solution
similar to your
idea - detecting a user agent's information such as OS, browser software,
accept-language....  But I finally decided not to support "userid" and
"password"
other than ISO-8859-1, because of next two reasons.

1. I didn't like to introduce such kinds of ambiguity in the
authentication logic.
2. If the standard is revised someday, the hack may cause more difficult
   issues - backward compatibility vs. the standard

So my conclusion was that the authentication code should handle any
non-ASCII bytes
(byte > 0x7f) as ISO-8859-1.  For now, it can support only Latin-1
"userid" and
"password", but it was the best effort.

- Yoshito Umaoka
Received on Thursday, 25 October 2001 23:11:53 UTC