Character encoding information of Http Request

This enquiry from Nelson Ng, e-Bay:


Problem statement:  Currently, there is no standard mechanism for web browsers to provide the character encoding information for textual data being sent to web applications in the http request.  This leads to various proprietary methods being used by web applications to determine the textual data character encoding of the http request.  These methods include inserting explicit character encoding tag in the URL, inserting hidden input field in FORM or pre-processing received textual data on the server side with character encoding detection logic.  In addition to possible development and performance overhead, these methods also have issues with producing reliable results.  For example, the tagging approach can yield incorrect result if the user overrides the character encoding of the page via the browser setting.  As for character encoding detection, the accuracy rate is not 100%.  Its accuracy depends largely on the type of character encoding and the length of the data.  In order for the web applications to process the textual data with correct character encoding information of each http request, it is necessary to have a standard mechanism for the web browser to communicate the character encoding used to generate the textual data sent as part of the http request.

 If you are aware of current or future standard to retrieve the character encoding from the browser, please let me know. 

  Thanks

 Nelson Ng

 Chief Globalization Architect

 eBay Inc.

 (408)376-5522


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/

Received on Wednesday, 13 September 2006 11:12:25 UTC