- From: Jungshik Shin <jshin@i18nl10n.com>
- Date: Thu, 11 Sep 2003 21:58:30 +0900 (KST)
- To: kuro@sonic.net
- Cc: www-international@w3.org
On Wed, 10 Sep 2003, KUROSAKA Teruhiko wrote: > > If you have a form on a page that is ISO-8859-1, and the data that is > > submitted (either as GET or as POST) from that form contains characters > > outside the ISO-8859-1 repertoire, what should the UA do? > The browser can chose to send the input data in UTF-8, as Martin > suggested already. As noted by Ian, if we do that, we just have to keep our fingers crossed that it would work on the other side (server-side applications). The odd is not very high, though. > It should put charset=UTF-8 in Content-Type header. Obviously, we can't do that for GET(there's no C-T header for GET. Currently MS IE and Mozilla use a proprieatary '_charset_' for this purpose only when the hidden field of '_charset_' is present in the form). Even for POST, as we discovered in July on this very list, adding 'C-T: ..... charset=UTF-8' to 'application/x-www-form-urlencoded' doesn't work very well with most server-side programs. At one time, Mozilla did just that but had to give it up (see http://bugzilla.mozilla.org/show_bug.cgi?id=18643c#10) because it broke so many server-side parsers. That was in 1999, but I'm afraid the situation haven't gotten better much since. An alternative of using 'multipart/form-data' and specifying 'charset' in C-T of individual parts may have a higher chance (for one thing, it's the author of a form that specifies 'enc-type' who should know her/his server-side parser.) However, the 'infrastructure' for this route may not be there, yet. For instance, widely-used Java sublet APIs don't yet support multipart/form-data (see a thread of articles beginning with http://lists.w3.org/Archives/Public/www-international/2003JulSep/0026.html). BTW, there's a Mozilla bug on adding C-T charset param to 'multipart/form-data' (http://bugzilla.mozilla.org/show_bug.cgi?id=116346) Jungshik
Received on Thursday, 11 September 2003 08:58:39 UTC