- From: Tex Texin <tex@i18nguy.com>
- Date: Mon, 21 Jul 2003 16:41:15 -0400
- To: Michael Jansson <mjan@em2-solutions.com>
- CC: "'aphillips@webmethods.com'" <aphillips@webmethods.com>, "Sinha, Raj (Raj)" <rajsinha@avaya.com>, www-international@w3.org
Raj, I think Addison and Michael's mail covered it pretty well, but if you are interested an older version of the Web I18n tutorial Yves Savourel and I give at the Unicode conference is online at: http://www.xencraft.com/resources/webi18ntutorial.pdf An updated version will be given at the conference in Atlanta in Sept. http://www.unicode.org/iuc/iuc24/index.html The unicode conference is a good place to get information on web internationalization. tex Michael Jansson wrote: > > Modern browsers all support HTML 4.0, which provides the "accept-charset" > attribute for FORM elements: > http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset > > A page may thus be designed so that POST data is of a specific encoding. > > IE5+ will honor this attribute in some but not all cases. It will use it to > "upgrade" the encoding from a mbcs format (e.g. ShiftJIS, cp 1252, etc) to > Unicode (e.g. utf-8) but not the reverse (which is annoying). Mozilla always > honors it. Opera does not. I would expect Konqueror/Safari to support it as > well, although I am not sure. Old browsers (Netscape 4.x, IE 4, etc) does > not. That was the situation the last time I checked at least. > > If there is no "accept-charset" attribute, then browsers will use the > encoding of the page from which the POST data is being posted from (e.g. use > the HTTP header, META tags, auto detect, etc). > > A common trick to deal with the idiosyncrasies between browsers (e.g. lack > of support for 'accept-charset') is to include a hidden field in the form > with a known value. Since the value is know, it's easy enough for a CGI > script to determine the encoding format of the POST data. The value is > simply compared with the result of using the possible encoding formats. > > Regards, > em2 Solutions > Michael Jansson > > > -----Original Message----- > > From: Addison Phillips [mailto:aphillips@webmethods.com] > > Sent: Monday, July 21, 2003 8:08 PM > > To: Sinha, Raj (Raj) > > Cc: www-international@w3.org > > Subject: Re: what should the charset be in the response to the server > > > > > > > > Hi Raj, > > > > The browser always sends data back to the server in the > > charset of the > > page. That is, if the browser thinks the page is UTF-8, it > > will encode > > its response using UTF-8. > > > > I use the word "thinks" because, of course, the browser must > > interpret > > the encoding of the page from the HTTP header and any META tag in the > > file. In some cases it must detect the encoding algorithmically. So > > whatever charset the browser ends up interpreting the page > > using is the > > encoding is uses for a response (either GET or POST). > > > > Hope that helps. > > > > Best Regards, > > > > Addison > > > > -- > > > > Addison P. Phillips > > Director, Globalization Architecture > > webMethods, Inc. > > > > Internationalization is an architecture. It is not a feature. > > > > [Chair, W3C-I18N-WG, Web Services Task Force] > > http://www.w3.org/International/ws > > > > > > > > -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Monday, 21 July 2003 16:43:43 UTC