W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

FW: what should the charset be in the response to the server

From: Sinha, Raj (Raj) <rajsinha@avaya.com>
Date: Tue, 22 Jul 2003 11:22:46 -0400
Message-ID: <8CA1128D59AD27429985B397118CEDDF0120895D@nj7460avexu1.global.avaya.com>
To: <www-international@w3.org>
Posting a response to this question that was replied only to me ...
don't know why. so forwarding it on behalf of Chris
 
 
Raj Sinha

 
 

	
	 
	Raj,
	 
	 
	The only response encoding you can rely on is from browser HTTP
PUT commands, where one of the headers tells the server what encoding
has been used..
	 
	The encoding used in HTTP GET is undefined in the standards for
characters outside of 7-bit ASCII. 
	 
	Anyone who says it is the encoding of the page is correct but
misleading, as the browser's user can manually decide what that encoding
is (changing whatever was declared in the transmitted page), so a web
server can have no certainty about the encoding used in the %hh escapes
in a GET, which is how non-ASCII is sent.
	 
	You might find the following helpful:
	 
	http://jetty.mortbay.com/jetty/doc/international.html
	 
	My advice: never use GET for sending a form containing
international characters, unless its absolutely unavoidable.
	 
	When using PUT, use the header to find out what encoding was
used.
	 
	If you are using Sun Servlets, the servlet container does all
the decoding for you and delivers the 16-bit Unicode characters to your
program.
	 
	Chris Haynes
	 
	 
	----- Original Message ----- 
	From: Sinha, Raj (Raj) <mailto:rajsinha@avaya.com>  
	To: www-international@w3.org 
	Sent: Monday, July 21, 2003 6:31 PM
	Subject: what should the charset be in the response to the
server

	Ok this is a very basic question but I cannot seem to find a
clear answer anywhere
	 
	I wrote a simple web  browser which is on it way of becoming
internationalized (UTF 8 support etc).
	 
	What I fail to understand is this:
	 
	I.  The web browser receives a page in lets say utf 8. It then
converts everything to utf16 (which is its internal choice of data
representation). what charset should the response to the server be. I
would guess the response should be in the original charset I,e utf8.
	 
	Consider this scenario:
	Browser request for a page indicating its preferences through
Accept-charset header 
	Server sends back a page with content type = utf 8
	Browser parses the age and converts everything into utf 16. 
	If there is a form the user can enter data into it... which is
again converted to the internal choice of utf 16
	The browser is ready to send the form results back to the
server...??? what should the encoding be here
	 
	 
	Thanks for any help
	 
	raj
	 
Received on Tuesday, 22 July 2003 11:22:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT