W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

RE: what should the charset be in the response to the server

From: Michael Jansson <mjan@em2-solutions.com>
Date: Wed, 23 Jul 2003 08:49:41 +0200
Message-ID: <CFDB95B7A60B714698C8E065A0759B0D1B68@gateway.em2-solutions.com>
To: "'Sinha, Raj (Raj)'" <rajsinha@avaya.com>, www-international@w3.org
I think Chris meant to say HTTP *POST* (HTTP PUT is something else). The
remark is important though. HTTP POST is to be preferred over HTTP GET. HTTP
GET product url encoded data that can be ambiguous for certain browsers and
certain charsets.
 
Regards,
em2 Solutions
Michael Jansson

-----Original Message-----
From: Sinha, Raj (Raj) [mailto:rajsinha@avaya.com]
Sent: Tuesday, July 22, 2003 5:23 PM
To: www-international@w3.org
Subject: FW: what should the charset be in the response to the server


Posting a response to this question that was replied only to me ... don't
know why. so forwarding it on behalf of Chris
 
 
Raj Sinha

 
 


 
Raj,
 
 
The only response encoding you can rely on is from browser HTTP PUT
commands, where one of the headers tells the server what encoding has been
used..
 
The encoding used in HTTP GET is undefined in the standards for characters
outside of 7-bit ASCII. 
 
Anyone who says it is the encoding of the page is correct but misleading, as
the browser's user can manually decide what that encoding is (changing
whatever was declared in the transmitted page), so a web server can have no
certainty about the encoding used in the %hh escapes in a GET, which is how
non-ASCII is sent.
 
You might find the following helpful:
 
http://jetty.mortbay.com/jetty/doc/international.html
<http://jetty.mortbay.com/jetty/doc/international.html> 
 
My advice: never use GET for sending a form containing international
characters, unless its absolutely unavoidable.
 
When using PUT, use the header to find out what encoding was used.
 
If you are using Sun Servlets, the servlet container does all the decoding
for you and delivers the 16-bit Unicode characters to your program.
 
Chris Haynes
 
 
----- Original Message ----- 
From: Sinha, Raj  <mailto:rajsinha@avaya.com> (Raj) 
To: www-international@w3.org <mailto:www-international@w3.org>  
Sent: Monday, July 21, 2003 6:31 PM
Subject: what should the charset be in the response to the server

Ok this is a very basic question but I cannot seem to find a clear answer
anywhere
 
I wrote a simple web  browser which is on it way of becoming
internationalized (UTF 8 support etc).
 
What I fail to understand is this:
 
I.  The web browser receives a page in lets say utf 8. It then converts
everything to utf16 (which is its internal choice of data representation).
what charset should the response to the server be. I would guess the
response should be in the original charset I,e utf8.
 
Consider this scenario:
Browser request for a page indicating its preferences through Accept-charset
header 
Server sends back a page with content type = utf 8
Browser parses the age and converts everything into utf 16. 
If there is a form the user can enter data into it... which is again
converted to the internal choice of utf 16
The browser is ready to send the form results back to the server...??? what
should the encoding be here
 
 
Thanks for any help
 
raj
 
Received on Wednesday, 23 July 2003 02:49:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT