Re: Form submission when successful controls contain characters outside the submission character set

On Wed, 10 Sep 2003, KUROSAKA Teruhiko wrote:
>>
>> If you have a form on a page that is ISO-8859-1, and the data that is
>> submitted (either as GET or as POST) from that form contains characters
>> outside the ISO-8859-1 repertoire, what should the UA do?
>
> Is this a question about the real behavor of the
> popular browsers, or are you developing a browser?

This is a question asked on behalf of Opera and Mozilla, both of which
recently ran into this issue.


> Assuming the latter, the browser is not obligated to send the input data
> in the same charset as the form itself.

It is, however, obligated to send the form submission in one of the
character sets specified in the accept-charset attribute.


> The browser can chose to send the input data in UTF-8, as Martin
> suggested already.

Unfortunately this is not a workable solution from three reasons:

 * If there's an accept-charset attribute, it's wrong to violate it.
 * There's no standard way to include character set selection information
   in a GET request (for forms with method="get").
 * Most servers cannot handle UTF-8 when they expect ISO-8859-1.

The first two are problems from a theoretical point of view, the last one
is a practical problem that prevents us from doing this.


> I don't think use of character entity is a right solution because the
> character entity is a syntax used in HTML/XML and the data returned from
> the form is not itself in HTML or XML.

Agreed.


Anyone have any other possible solutions? :-)

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 11 September 2003 04:39:16 UTC