Re: Form submission when successful controls contain characters outside the submission character set

On Wed, 10 Sep 2003, KUROSAKA Teruhiko wrote:

>  > If you have a form on a page that is ISO-8859-1, and the data that is
>  > submitted (either as GET or as POST) from that form contains characters
>  > outside the ISO-8859-1 repertoire, what should the UA do?

> The browser can chose to send the input data in UTF-8, as Martin
> suggested already.

  As noted by Ian, if we do that, we just have to keep our fingers
crossed that it would work on the other side (server-side applications).
The odd is not very high, though.

> It should put charset=UTF-8 in Content-Type header.

  Obviously, we can't do that for GET(there's no C-T header for
GET. Currently MS IE and Mozilla use a proprieatary '_charset_' for this
purpose only when the hidden field of '_charset_' is present in the form).

Even for POST, as we discovered in July on this very list, adding
'C-T: ..... charset=UTF-8' to 'application/x-www-form-urlencoded'
doesn't work very well with most server-side programs. At
one time, Mozilla did just that but had to give it up (see
http://bugzilla.mozilla.org/show_bug.cgi?id=18643c#10) because it broke
so many server-side parsers. That was in 1999, but I'm afraid the
situation haven't gotten better much since. An alternative of using
'multipart/form-data' and specifying 'charset' in C-T of individual
parts may have a higher chance (for one thing, it's the author of a form
that specifies 'enc-type' who should know her/his server-side parser.)
However, the 'infrastructure' for this route may not be there, yet. For
instance, widely-used Java sublet APIs don't yet support
multipart/form-data (see a thread of articles beginning with
http://lists.w3.org/Archives/Public/www-international/2003JulSep/0026.html).
BTW, there's a Mozilla bug on adding C-T charset param to
'multipart/form-data' (http://bugzilla.mozilla.org/show_bug.cgi?id=116346)

 Jungshik

Received on Thursday, 11 September 2003 08:58:39 UTC