W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: what should the charset be in the response to the server

From: Chris Haynes <chris@harvington.org.uk>
Date: Sat, 26 Jul 2003 12:15:58 +0100
Message-ID: <001101c35367$4a6e10d0$0200000a@ringo>
To: "Jungshik Shin" <jshin@i18nl10n.com>
Cc: <www-international@w3.org>

 "Jungshik Shin" replied at: Saturday, July 26, 2003 11:31 AM

> On Sat, 26 Jul 2003, Chris Haynes wrote:
> > find that the advice I gave was wrong - I had taken it on trust
> > someone else that the Content-Encoding field is set by user agents
> > when sending a POST which includes encoded content (why isn't
> Content-Encoding doesn't seem to be a good place for specifying
> It's not mentioned for _that purpose_ in any standard(or what
purpoted to be
> standard) document. It's for gzip, compress and things like that.
> It's Content-Type header  that should have charset parameter (not
> in HTTP but also in MIME). However,
> as you wrote, most server side tools/aplications have to be updated
> handle that.

Whoops, my typo. My head thought one thing, my fingers typed another.

> > MSIE 6.0 does support this feature (i.e. the query string includes
> > name-value pair "_charset_=UTF-8")
> > Netscape Navigator 6.2.2 does not
> > Opera 7.11 does not.
>   Gee. Netscape 6.2.2 (based on pre-1.0 Mozilla) sounds like an
ancient relic.
> To be fair, you may wish to try Mozilla 1.4 or the latest netscape
based on
> Mozilla 1.4.
> > I conclude that the '_charset_' mechanism, although ingenious, is
> > non-standard, proprietary distraction.
>   I agree.
> > The only standards-based way of being _sure_ what character
> > has been applied to form data appears to be to use
> >
> >     <form action=... method='post' accept-charset='UTF-8'>
> >
> > (or whatever the page author's chosen character encoding is).
>  I don't know what Lynx and w3m-m17n do in this case. They're
> for some people with accessibility problems.  I'll check them out.

That would be useful, thanks

> > Interestingly, changing from 'post' to 'get' in MSIE6 re-enables
> > user's control over the encoding used (i.e. over the values now
> > transmitted in the URI query string). I have not tested this with
> > other two browsers.
>   It also depends on whether or not you set 'send URLs always in
UTF-8' in
> Tools|Options(?) in MS IE.

True, but I'm trying to find a 'reliable' mechanism which is not
dependent on user-accessible controls.
IMHO, this is also a 'dangerous' option, in that it goes agains the de
facto conventions and anticipates (parhaps incorrectly) the
recommendations of the proposed IRI RFC. It can only safely be used
with a 'consenting' server site.

> > The only other reliable transfer mechanism available would appear
> > be  the ENCTYPE="multipart/form-data" method being discussed in
> > same thread, but this format is not decoded by standard Servlet
> > containers, so the convenient HttpRequest.getParameter() Servlet
> > could not be used with this mechanism.
>    Can this API be updated to decipher charset parameter present in
> C-T header fields of subparts of multipart/form-data? HTML 4.x was
> released in 1999(?) and .....

I don't see why not. It would seem to be architecturally possible.

Sun's Servlet spec. 2.3 (and final draft 3 of  version 2.4) say that
only application/x-www-form-urlencoded is to be used as a source of
parameters when POST is used (sect 4.1.1).

It seems to be illogical that, from a page designer's  point of view,
there are three different ways of returning the same data, yet Servlet
containers only support the two which use query-string encoding - and
the one they don't support is the only one that carries with it an
unambiguouis indication of the character encoding used!.

It's probably too late in the 2.4 cycle to propose this change. Maybe
someone should do so for 2.5.

>    Jungshik

Received on Saturday, 26 July 2003 07:25:00 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:23 UTC