W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: what should the charset be in the response to the server

From: Jungshik Shin <jshin@i18nl10n.com>
Date: Sat, 26 Jul 2003 06:31:38 -0400 (EDT)
To: Chris Haynes <chris@harvington.org.uk>
cc: <www-international@w3.org>
Message-ID: <Pine.BSO.4.33.0307260616090.9266-100000@callisto.jtan.com>

On Sat, 26 Jul 2003, Chris Haynes wrote:

> find that the advice I gave was wrong - I had taken it on trust from
> someone else that the Content-Encoding field is set by user agents
> when sending a POST which includes encoded content (why isn't it?).

Content-Encoding doesn't seem to be a good place for specifying charset.
It's not mentioned for _that purpose_ in any standard(or what purpoted to be
standard) document. It's for gzip, compress and things like that.

It's Content-Type header  that should have charset parameter (not just
in HTTP but also in MIME). However,
as you wrote, most server side tools/aplications have to be updated to
handle that.

> MSIE 6.0 does support this feature (i.e. the query string includes the
> name-value pair "_charset_=UTF-8")
> Netscape Navigator 6.2.2 does not
> Opera 7.11 does not.

  Gee. Netscape 6.2.2 (based on pre-1.0 Mozilla) sounds like an ancient relic.
To be fair, you may wish to try Mozilla 1.4 or the latest netscape based on
Mozilla 1.4.

> I conclude that the '_charset_' mechanism, although ingenious, is a
> non-standard, proprietary distraction.

  I agree.

> The only standards-based way of being _sure_ what character encoding
> has been applied to form data appears to be to use
>
>     <form action=... method='post' accept-charset='UTF-8'>
>
> (or whatever the page author's chosen character encoding is).

 I don't know what Lynx and w3m-m17n do in this case. They're important
for some people with accessibility problems.  I'll check them out.


> Interestingly, changing from 'post' to 'get' in MSIE6 re-enables the
> user's control over the encoding used (i.e. over the values now
> transmitted in the URI query string). I have not tested this with the
> other two browsers.

  It also depends on whether or not you set 'send URLs always in UTF-8' in
Tools|Options(?) in MS IE.


> The only other reliable transfer mechanism available would appear to
> be  the ENCTYPE="multipart/form-data" method being discussed in this
> same thread, but this format is not decoded by standard Servlet
> containers, so the convenient HttpRequest.getParameter() Servlet API
> could not be used with this mechanism.

   Can this API be updated to decipher charset parameter present in
C-T header fields of subparts of multipart/form-data? HTML 4.x was
released in 1999(?) and .....


   Jungshik
Received on Saturday, 26 July 2003 06:31:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT