W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: what should the charset be in the response to the server

From: Chris Haynes <chris@harvington.org.uk>
Date: Mon, 28 Jul 2003 19:19:16 +0100
Message-ID: <000e01c35534$c36ee170$0200000a@ringo>
To: "Shigemichi Yazawa" <yazawa@globalsight.com>, "Jungshik Shin" <jshin@i18nl10n.com>
Cc: <www-international@w3.org>

"Shigemichi Yazawa" wrote at Monday, July 28, 2003 5:41 PM

>
> At Fri, 25 Jul 2003 23:39:13 -0400 (EDT),
> Jungshik Shin wrote:
> >
> >   Have you tried Mozilla 1.4? Mozilla 1.0 is pretty outdated and a
lot
> > of features have been added since.
> ...
> > Have you tried entering some non-ASCII characters? The default (in
MIME)
> > content-type is 'Content-Type: text/plain; charset=US-ASCII' and
it can
> > be omitted. If it still does  not add 'C-T' header with charset
> > parameter for non-ASCII chars, I'll file a bug
> > and hopefully fix it.
>
> I entered a Japanese text and got the same result. No content-type
> header.  Mozilla 1.4 (for Windows) doesn't put it either.
>
> I setup a JSP here. Feel free to try this out yourself.
>
> http://www.runout.org/html-form-test/multi-part-form.jsp
>
> The input data in this page is written out in ISO-8859-1. So any
> non-ASCII string will be shown as garbage.
>
> I also setup another JSP that adds accept-charset="UTF-8" in FORM
> element as Chris suggested.
>
> http://www.runout.org/html-form-test/accept-charset.jsp
>
> It seems to work fine even if you change the character encoding in
> your browser. This seems to be a effective solution for immediate
> needs.
>
> Even using this technique, you still have to do this old trick.
>
>   new String(request.getParameter("param").getBytes("ISO8859_1"),
"UTF-8");
>
> That's because the character encoding is specified only in the page
> source and not in the HTTP request.

This should not be necessary.

Your Servlet container should implement the Sun Servlet Spec.(Version
2.3, section 4.9), which says that if you call
     request.setCharacterEncoding(String)
 before accessing any of the parameters, the specified encoding is
used in decoding the requests' parameters.

So you can just call

    request.setCharacterEncoding("UTF-8");
    request.getParameter("param");

You know the request was encoded in UTF-8 because you mandated it in
the form's 'accept-charset' attribute - and my conjecture is that this
can be trusted.

This all works fine with the Jetty container I use
http://jetty.mortbay.org .

>
> It seems that enctype="text/plain" was proposed at some point. If
this
> scheme is employed and browsers add charset parameter in HTTP
request,
> the server side API can reliably convert the character encoding.
Just
> a thought...
>
> -------------------
> Shigemichi Yazawa
> yazawa@globalsight.com
>
>


Chris Haynes
Received on Monday, 28 July 2003 14:28:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT