W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

RE: what should the charset be in the response to the server

From: Michael Jansson <mjan@em2-solutions.com>
Date: Mon, 21 Jul 2003 20:54:55 +0200
Message-ID: <CFDB95B7A60B714698C8E065A0759B0D1B64@gateway.em2-solutions.com>
To: "'aphillips@webmethods.com'" <aphillips@webmethods.com>, "Sinha, Raj (Raj)" <rajsinha@avaya.com>
Cc: www-international@w3.org

Modern browsers all support HTML 4.0, which provides the "accept-charset"
attribute for FORM elements:
 http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset 

A page may thus be designed so that POST data is of a specific encoding. 

IE5+ will honor this attribute in some but not all cases. It will use it to
"upgrade" the encoding from a mbcs format (e.g. ShiftJIS, cp 1252, etc) to
Unicode (e.g. utf-8) but not the reverse (which is annoying). Mozilla always
honors it. Opera does not. I would expect Konqueror/Safari to support it as
well, although I am not sure. Old browsers (Netscape 4.x, IE 4, etc) does
not. That was the situation the last time I checked at least.

If there is no "accept-charset" attribute, then browsers will use the
encoding of the page from which the POST data is being posted from (e.g. use
the HTTP header, META tags, auto detect, etc).

A common trick to deal with the idiosyncrasies between browsers (e.g. lack
of support for 'accept-charset') is to include a hidden field in the form
with a known value. Since the value is know, it's easy enough for a CGI
script to determine the encoding format of the POST data. The value is
simply compared with the result of using the possible encoding formats.


Regards,
em2 Solutions
Michael Jansson

> -----Original Message-----
> From: Addison Phillips [mailto:aphillips@webmethods.com]
> Sent: Monday, July 21, 2003 8:08 PM
> To: Sinha, Raj (Raj)
> Cc: www-international@w3.org
> Subject: Re: what should the charset be in the response to the server
> 
> 
> 
> Hi Raj,
> 
> The browser always sends data back to the server in the 
> charset of the 
> page. That is, if the browser thinks the page is UTF-8, it 
> will encode 
> its response using UTF-8.
> 
> I use the word "thinks" because, of course, the browser must 
> interpret 
> the encoding of the page from the HTTP header and any META tag in the 
> file. In some cases it must detect the encoding algorithmically. So 
> whatever charset the browser ends up interpreting the page 
> using is the 
> encoding is uses for a response (either GET or POST).
> 
> Hope that helps.
> 
> Best Regards,
> 
> Addison
> 
> -- 
> 
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> 
> Internationalization is an architecture. It is not a feature.
> 
> [Chair, W3C-I18N-WG, Web Services Task Force]
> http://www.w3.org/International/ws
> 
> 
> 
> 
Received on Monday, 21 July 2003 14:54:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT