W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: what should the charset be in the response to the server

From: Tex Texin <tex@i18nguy.com>
Date: Mon, 21 Jul 2003 16:41:15 -0400
Message-ID: <3F1C4FEB.66152337@I18nGuy.com>
To: Michael Jansson <mjan@em2-solutions.com>
CC: "'aphillips@webmethods.com'" <aphillips@webmethods.com>, "Sinha, Raj (Raj)" <rajsinha@avaya.com>, www-international@w3.org

Raj,
I think Addison and Michael's mail covered it pretty well, but if you are
interested an older version of the Web I18n tutorial Yves Savourel and I give
at the Unicode conference is online at:

http://www.xencraft.com/resources/webi18ntutorial.pdf

An updated version will be given at the conference in Atlanta in Sept.
http://www.unicode.org/iuc/iuc24/index.html

The unicode conference is a good place to get information on web
internationalization.
tex

Michael Jansson wrote:
> 
> Modern browsers all support HTML 4.0, which provides the "accept-charset"
> attribute for FORM elements:
>  http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset
> 
> A page may thus be designed so that POST data is of a specific encoding.
> 
> IE5+ will honor this attribute in some but not all cases. It will use it to
> "upgrade" the encoding from a mbcs format (e.g. ShiftJIS, cp 1252, etc) to
> Unicode (e.g. utf-8) but not the reverse (which is annoying). Mozilla always
> honors it. Opera does not. I would expect Konqueror/Safari to support it as
> well, although I am not sure. Old browsers (Netscape 4.x, IE 4, etc) does
> not. That was the situation the last time I checked at least.
> 
> If there is no "accept-charset" attribute, then browsers will use the
> encoding of the page from which the POST data is being posted from (e.g. use
> the HTTP header, META tags, auto detect, etc).
> 
> A common trick to deal with the idiosyncrasies between browsers (e.g. lack
> of support for 'accept-charset') is to include a hidden field in the form
> with a known value. Since the value is know, it's easy enough for a CGI
> script to determine the encoding format of the POST data. The value is
> simply compared with the result of using the possible encoding formats.
> 
> Regards,
> em2 Solutions
> Michael Jansson
> 
> > -----Original Message-----
> > From: Addison Phillips [mailto:aphillips@webmethods.com]
> > Sent: Monday, July 21, 2003 8:08 PM
> > To: Sinha, Raj (Raj)
> > Cc: www-international@w3.org
> > Subject: Re: what should the charset be in the response to the server
> >
> >
> >
> > Hi Raj,
> >
> > The browser always sends data back to the server in the
> > charset of the
> > page. That is, if the browser thinks the page is UTF-8, it
> > will encode
> > its response using UTF-8.
> >
> > I use the word "thinks" because, of course, the browser must
> > interpret
> > the encoding of the page from the HTTP header and any META tag in the
> > file. In some cases it must detect the encoding algorithmically. So
> > whatever charset the browser ends up interpreting the page
> > using is the
> > encoding is uses for a response (either GET or POST).
> >
> > Hope that helps.
> >
> > Best Regards,
> >
> > Addison
> >
> > --
> >
> > Addison P. Phillips
> > Director, Globalization Architecture
> > webMethods, Inc.
> >
> > Internationalization is an architecture. It is not a feature.
> >
> > [Chair, W3C-I18N-WG, Web Services Task Force]
> > http://www.w3.org/International/ws
> >
> >
> >
> >

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Monday, 21 July 2003 16:43:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT