RE: Form submission when successful controls contain characters outside the submission character set

I agree with Kuro.  If you want to be compatible with legacy servers
which you seem to want to, then you better encode the text in the
character set of the page or the form.  That is, in your example, 8859-1.

It might actually be nice for the browser to warn the
user when they attempt to type a character outside the 8859-1 set.
I am personally not a fan of software that lets you type in any characters
and then turns those characters into ??? when it sends them to the server.

Unfortunately we have a lot of legacy web pages that use 8859-1 (which
at one point was considered a very "inclusive" character set).  Over
time, these web pages will be improved to use UTF-8 and these issues
will largely go away.  I don't think anyone expects Opera or Mozilla to
be able to compensate for the limitations of legacy servers and legacy
server side code.

The clear direction of the W3C to solve these character set issues is for
new web software to implement good support for UTF-8 and to encourage
web page authors to upgrade to UTF-8.

-Paul


-----Original Message-----
From: KUROSAKA Teruhiko [mailto:kuro@bhlab.com]
Sent: Thursday, September 11, 2003 9:38 AM
To: Ian Hickson
Cc: kuro@sonic.net; www-international@w3.org
Subject: Re: Form submission when successful controls contain characters
outside the submission character set



Ian,


 >>The browser can chose to send the input data in UTF-8, as Martin
 >>suggested already.
 >
 >
 > Unfortunately this is not a workable solution from three reasons:
 >
 >  * If there's an accept-charset attribute, it's wrong to violate it.
 >  * There's no standard way to include character set selection information
 >    in a GET request (for forms with method="get").
 >  * Most servers cannot handle UTF-8 when they expect ISO-8859-1.

I see.

In that case (accept-charset does not include Unicode charsets),
the best "solution" may be simply replace those
out-of-charset characters with a replacement character,
probably '?', on transmission.

If the form itself is written in ISO-8859-1 or any
other traditional charsets other than UTF-* or other Unicode based
charsets, and if accept-charset is not there or it does not
include UTF-8, the web app is probably not prepared to handle
those characters.  That is, even if we come up with a creative
way of transmitting these out-of-charset characters, that would not
solve the real problem: the web app doesn't handle out-of-charset
characters.  In other words, I would expect the fully internationalized
web apps to use UTF-8 for the form (or declare it can accept UTF-8
using accept-charset and use POST instead of GET), and to
interpret charset attribute in C-T header.

Do you have a particular use case where sending the
out-of-charset characters may be benefitial?

Regards,
--
T. "Kuro" Kurosaka, San Francisco, California

Received on Friday, 12 September 2003 07:36:13 UTC