Fwd: Re: Form submission when successful controls contain characters outside the submission character set

This is a forwarded message
From: Ian Hickson <ian@hixie.ch>
To: "kuro@sonic.net" <kuro@sonic.net>
Date: Thursday, September 11, 2003, 10:39:15 AM
Subject: Form submission when successful controls contain characters  outside   the submission character set

===8<==============Original message text===============

On Wed, 10 Sep 2003, KUROSAKA Teruhiko wrote:
>>
>> If you have a form on a page that is ISO-8859-1, and the data that is
>> submitted (either as GET or as POST) from that form contains characters
>> outside the ISO-8859-1 repertoire, what should the UA do?
>
> Is this a question about the real behavor of the
> popular browsers, or are you developing a browser?

This is a question asked on behalf of Opera and Mozilla, both of which
recently ran into this issue.


> Assuming the latter, the browser is not obligated to send the input data
> in the same charset as the form itself.

It is, however, obligated to send the form submission in one of the
character sets specified in the accept-charset attribute.


> The browser can chose to send the input data in UTF-8, as Martin
> suggested already.

Unfortunately this is not a workable solution from three reasons:

 * If there's an accept-charset attribute, it's wrong to violate it.
 * There's no standard way to include character set selection information
   in a GET request (for forms with method="get").
 * Most servers cannot handle UTF-8 when they expect ISO-8859-1.

The first two are problems from a theoretical point of view, the last one
is a practical problem that prevents us from doing this.


> I don't think use of character entity is a right solution because the
> character entity is a syntax used in HTML/XML and the data returned from
> the form is not itself in HTML or XML.

Agreed.


Anyone have any other possible solutions? :-)

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

===8<===========End of original message text===========

Forwarding this message as evidence that the internationalization
issues with GET for form submission are still acute and still not
solved. PUT, with an XML body, solves them.

GET might solve them in the future, if for example the Accept charset
specification is ammended to say that servers should or must accept
UTF-8 (and perhaps UTF-16, though only one is needed) in the same way
that XML parsets must accept UTF-8 (and UTF-16).

Until then, there will continue to be forms that cannot correctly
transfer the text entered by a user, if they submit the results using
GET.

-- 
Best regards,
 Chris                            mailto:chris@w3.org

Received on Thursday, 11 September 2003 05:23:40 UTC