- From: by way of Martin Duerst <duerst@w3.org>
- Date: Fri, 26 Oct 2001 10:56:57 +0900
- To: www-international@w3.org
We have a multi-lingual application supported in 13 languages. For this
problem we came up with 3 different
authentication schemes. It depends on the deployer to choose a scheme
1. Basic authentication (HTTP) here we do not support asian users
2. Forms based authentication (using JDBC and JCE) - Supports UTF-8
so usernames and passwords can be in different languages.
3. LDAP authentication.
-Aruna
Yoshito_Umaoka@lotus.co.jp
Sent by: www-international-request@w3.org
10/25/2001 09:17 AM
To: www-international@w3.org
cc:
Subject: RE: [www-international] <none>
>I hope this is an appropriate question for www-international. It concerns
>the behavior of HHTP authentication in a multilingual environment.
I struggled with the same issue before.
The HTTP basic authentication is defined in RFC2617. The definitions for
"userid" and
"password" are below.
> credentials = "Basic" basic-credentials
> basic-credentials = base64-user-pass
> base64-user-pass = <base64 [4] encoding of user-pass, except not
limited to 76 char/line>
> user-pass = userid ":" password
> userid = *<TEXT excluding ":">
> password = *TEXT
Based on the definitions, you can use TEXT for password and TEXT excluding
":" for
"userid" and any TEXT for "password". RFC2617 inherits rules from RFC2616
and
TEXT rule in RFC2616 is defined like below.
> OCTET = <any 8-bit sequence of data>
> CHAR = <any US-ASCII character (octets 0 - 127)>
> UPALPHA = <any US-ASCII uppercase letter "A".."Z">
> LOALPHA = <any US-ASCII lowercase letter "a".."z">
> ALPHA = UPALPHA | LOALPHA
> DIGIT = <any US-ASCII digit "0".."9">
> CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
> CR = <US-ASCII CR, carriage return (13)>
> LF = <US-ASCII LF, linefeed (10)>
> SP = <US-ASCII SP, space (32)>
> HT = <US-ASCII HT, horizontal-tab (9)>
> <"> = <US-ASCII double-quote mark (34)>
> HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
> protocol elements except the entity-body (see appendix 19.3 for
> tolerant applications). The end-of-line marker within an entity-body
> is defined by its associated media type, as described in section 3.7.
>
> CRLF = CR LF
>
> HTTP/1.1 header field values can be folded onto multiple lines if the
> continuation line begins with a space or horizontal tab. All linear
> white space, including folding, has the same semantics as SP. A
> recipient MAY replace any linear white space with a single SP before
> interpreting the field value or forwarding the message downstream.
>
> LWS = [CRLF] 1*( SP | HT )
>
> The TEXT rule is only used for descriptive field contents and values
> that are not intended to be interpreted by the message parser. Words
> of *TEXT MAY contain characters from character sets other than ISO-
> 8859-1 [22] only when encoded according to the rules of RFC 2047
> [14].
>
> TEXT = <any OCTET except CTLs,
> but including LWS>
So the standard says -
1. You can use any characters defined in ISO-8859-1 for "userid" and
"password"
2. If you want to use any other charsets other than ISO-8859-1 for
"userid" and
"password", you must encode the string data according to the rules
defined in
RFC2047 (MIME encoded word)
However, no browser support the standard as far as I know.
>It seems that the authentication data sent from the browser in response
to a
>server request is supplied in the platform codepage of the system that
the
>browser is running on.
>
>In other words, on Japanese windows, the username comes back in cp932, on
a
>french windows machine, the username comes back in cp1252, on a Solaris
>machine, it comes back in whatever the platform encoding is set to..
Yes. I found the same things. I was also thinking about a solution
similar to your
idea - detecting a user agent's information such as OS, browser software,
accept-language.... But I finally decided not to support "userid" and
"password"
other than ISO-8859-1, because of next two reasons.
1. I didn't like to introduce such kinds of ambiguity in the
authentication logic.
2. If the standard is revised someday, the hack may cause more difficult
issues - backward compatibility vs. the standard
So my conclusion was that the authentication code should handle any
non-ASCII bytes
(byte > 0x7f) as ISO-8859-1. For now, it can support only Latin-1
"userid" and
"password", but it was the best effort.
- Yoshito Umaoka
Received on Thursday, 25 October 2001 23:11:53 UTC