- From: Chris Haynes <chris@harvington.org.uk>
- Date: Sat, 3 Apr 2004 09:48:04 +0100
- To: "Michael Monaghan" <Michael.Monaghan@Sun.COM>, <www-international@w3.org>
"Michael Monaghan" asked:. > I've been testing what happens when you input say, Japanese text into a form on an ISO-1 > encoded page. ..> > Is encoding the query string using character entities covered in some IETF/W3 spec? My understanding is that, if you just declare the character encoding of the page in which the form is written, there are 'hints' in the W3C HTML 4.01 spec that that same encoding should be used when submitting the form data but, to my reading, it is not mandated. However I have concluded that you should never rely on the page encoding to infer the encoding used in submitting form data, because users can use their browser menu to change the encoding in use (on MSIE: View - Encoding - ...). When using POST there should be a record of the encoding that the browser used in one of the HTTP or MIME headers, but with GET there is no mechanism by which the browser can communicate the encoding it used. However, if you specify a single encoding in an ACCEPT-CHARSET attribute in the declaration of the form itself, on all the browsers I had access to that encoding was used - with NCRs where necessary. Usefully, this applied to forms submitted using POST _and_ using GET (i.e. the URL query part was encoded using the specified encoding). I now always use UTF-8 here, even if the page containing the form is encoded in some other encoding, so that I _know_ how to decode the NICs (and all the other octets). The <form accept-charset='UTF-8' ... > approach appears to override the influence of the user's "View-Encoding-" control and seems to provide the _only_ mechanism by which you can be sure how the characters in the form data have been encoded for both POST and GET. This worked on all the 6th/7th generation Win32 and Linux browsers I had access to. I was not able to test this on the Mac browsers; it would be interesting to know if this works on them as well. Chris Haynes
Received on Saturday, 3 April 2004 03:49:33 UTC