- From: Paul Deuter <PaulD@plumtree.com>
- Date: Fri, 12 Sep 2003 07:28:41 -0400
- To: www-international@w3.org
I agree with Kuro. If you want to be compatible with legacy servers which you seem to want to, then you better encode the text in the character set of the page or the form. That is, in your example, 8859-1. It might actually be nice for the browser to warn the user when they attempt to type a character outside the 8859-1 set. I am personally not a fan of software that lets you type in any characters and then turns those characters into ??? when it sends them to the server. Unfortunately we have a lot of legacy web pages that use 8859-1 (which at one point was considered a very "inclusive" character set). Over time, these web pages will be improved to use UTF-8 and these issues will largely go away. I don't think anyone expects Opera or Mozilla to be able to compensate for the limitations of legacy servers and legacy server side code. The clear direction of the W3C to solve these character set issues is for new web software to implement good support for UTF-8 and to encourage web page authors to upgrade to UTF-8. -Paul -----Original Message----- From: KUROSAKA Teruhiko [mailto:kuro@bhlab.com] Sent: Thursday, September 11, 2003 9:38 AM To: Ian Hickson Cc: kuro@sonic.net; www-international@w3.org Subject: Re: Form submission when successful controls contain characters outside the submission character set Ian, >>The browser can chose to send the input data in UTF-8, as Martin >>suggested already. > > > Unfortunately this is not a workable solution from three reasons: > > * If there's an accept-charset attribute, it's wrong to violate it. > * There's no standard way to include character set selection information > in a GET request (for forms with method="get"). > * Most servers cannot handle UTF-8 when they expect ISO-8859-1. I see. In that case (accept-charset does not include Unicode charsets), the best "solution" may be simply replace those out-of-charset characters with a replacement character, probably '?', on transmission. If the form itself is written in ISO-8859-1 or any other traditional charsets other than UTF-* or other Unicode based charsets, and if accept-charset is not there or it does not include UTF-8, the web app is probably not prepared to handle those characters. That is, even if we come up with a creative way of transmitting these out-of-charset characters, that would not solve the real problem: the web app doesn't handle out-of-charset characters. In other words, I would expect the fully internationalized web apps to use UTF-8 for the form (or declare it can accept UTF-8 using accept-charset and use POST instead of GET), and to interpret charset attribute in C-T header. Do you have a particular use case where sending the out-of-charset characters may be benefitial? Regards, -- T. "Kuro" Kurosaka, San Francisco, California
Received on Friday, 12 September 2003 07:36:13 UTC