- From: <Yoshito_Umaoka@lotus.co.jp>
- Date: Thu, 25 Oct 2001 12:17:41 -0400
- To: www-international@w3.org
>I hope this is an appropriate question for www-international. It concerns >the behavior of HHTP authentication in a multilingual environment. I struggled with the same issue before. The HTTP basic authentication is defined in RFC2617. The definitions for "userid" and "password" are below. > credentials = "Basic" basic-credentials > basic-credentials = base64-user-pass > base64-user-pass = <base64 [4] encoding of user-pass, except not limited to 76 char/line> > user-pass = userid ":" password > userid = *<TEXT excluding ":"> > password = *TEXT Based on the definitions, you can use TEXT for password and TEXT excluding ":" for "userid" and any TEXT for "password". RFC2617 inherits rules from RFC2616 and TEXT rule in RFC2616 is defined like below. > OCTET = <any 8-bit sequence of data> > CHAR = <any US-ASCII character (octets 0 - 127)> > UPALPHA = <any US-ASCII uppercase letter "A".."Z"> > LOALPHA = <any US-ASCII lowercase letter "a".."z"> > ALPHA = UPALPHA | LOALPHA > DIGIT = <any US-ASCII digit "0".."9"> > CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)> > CR = <US-ASCII CR, carriage return (13)> > LF = <US-ASCII LF, linefeed (10)> > SP = <US-ASCII SP, space (32)> > HT = <US-ASCII HT, horizontal-tab (9)> > <"> = <US-ASCII double-quote mark (34)> > HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all > protocol elements except the entity-body (see appendix 19.3 for > tolerant applications). The end-of-line marker within an entity-body > is defined by its associated media type, as described in section 3.7. > > CRLF = CR LF > > HTTP/1.1 header field values can be folded onto multiple lines if the > continuation line begins with a space or horizontal tab. All linear > white space, including folding, has the same semantics as SP. A > recipient MAY replace any linear white space with a single SP before > interpreting the field value or forwarding the message downstream. > > LWS = [CRLF] 1*( SP | HT ) > > The TEXT rule is only used for descriptive field contents and values > that are not intended to be interpreted by the message parser. Words > of *TEXT MAY contain characters from character sets other than ISO- > 8859-1 [22] only when encoded according to the rules of RFC 2047 > [14]. > > TEXT = <any OCTET except CTLs, > but including LWS> So the standard says - 1. You can use any characters defined in ISO-8859-1 for "userid" and "password" 2. If you want to use any other charsets other than ISO-8859-1 for "userid" and "password", you must encode the string data according to the rules defined in RFC2047 (MIME encoded word) However, no browser support the standard as far as I know. >It seems that the authentication data sent from the browser in response to a >server request is supplied in the platform codepage of the system that the >browser is running on. > >In other words, on Japanese windows, the username comes back in cp932, on a >french windows machine, the username comes back in cp1252, on a Solaris >machine, it comes back in whatever the platform encoding is set to.. Yes. I found the same things. I was also thinking about a solution similar to your idea - detecting a user agent's information such as OS, browser software, accept-language.... But I finally decided not to support "userid" and "password" other than ISO-8859-1, because of next two reasons. 1. I didn't like to introduce such kinds of ambiguity in the authentication logic. 2. If the standard is revised someday, the hack may cause more difficult issues - backward compatibility vs. the standard So my conclusion was that the authentication code should handle any non-ASCII bytes (byte > 0x7f) as ISO-8859-1. For now, it can support only Latin-1 "userid" and "password", but it was the best effort. - Yoshito Umaoka
Received on Thursday, 25 October 2001 12:18:19 UTC