Re: Form submission when successful controls contain characters outside the submission character set from KUROSAKA Teruhiko on 2003-09-11 (www-international@w3.org from July to September 2003)

From: KUROSAKA Teruhiko <kuro@bhlab.com>
Date: Thu, 11 Sep 2003 09:37:57 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: "kuro@sonic.net" <kuro@sonic.net>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <3F60A4E5.6030807@bhlab.com>

Ian,


>>The browser can chose to send the input data in UTF-8, as Martin
>>suggested already.
> 
> 
> Unfortunately this is not a workable solution from three reasons:
> 
>  * If there's an accept-charset attribute, it's wrong to violate it.
>  * There's no standard way to include character set selection information
>    in a GET request (for forms with method="get").
>  * Most servers cannot handle UTF-8 when they expect ISO-8859-1.

I see.

In that case (accept-charset does not include Unicode charsets),
the best "solution" may be simply replace those
out-of-charset characters with a replacement character,
probably '?', on transmission.

If the form itself is written in ISO-8859-1 or any
other traditional charsets other than UTF-* or other Unicode based
charsets, and if accept-charset is not there or it does not
include UTF-8, the web app is probably not prepared to handle
those characters.  That is, even if we come up with a creative
way of transmitting these out-of-charset characters, that would not
solve the real problem: the web app doesn't handle out-of-charset
characters.  In other words, I would expect the fully internationalized
web apps to use UTF-8 for the form (or declare it can accept UTF-8
using accept-charset and use POST instead of GET), and to
interpret charset attribute in C-T header.

Do you have a particular use case where sending the
out-of-charset characters may be benefitial?

Regards,
-- 
T. "Kuro" Kurosaka, San Francisco, California

Received on Thursday, 11 September 2003 12:37:14 UTC