- From: Chris Haynes <chris@harvington.org.uk>
- Date: Sat, 26 Jul 2003 12:15:58 +0100
- To: "Jungshik Shin" <jshin@i18nl10n.com>
- Cc: <www-international@w3.org>
"Jungshik Shin" replied at: Saturday, July 26, 2003 11:31 AM > > On Sat, 26 Jul 2003, Chris Haynes wrote: > > > find that the advice I gave was wrong - I had taken it on trust from > > someone else that the Content-Encoding field is set by user agents > > when sending a POST which includes encoded content (why isn't it?). > > Content-Encoding doesn't seem to be a good place for specifying charset. > It's not mentioned for _that purpose_ in any standard(or what purpoted to be > standard) document. It's for gzip, compress and things like that. > > It's Content-Type header that should have charset parameter (not just > in HTTP but also in MIME). However, > as you wrote, most server side tools/aplications have to be updated to > handle that. Whoops, my typo. My head thought one thing, my fingers typed another. > > > MSIE 6.0 does support this feature (i.e. the query string includes the > > name-value pair "_charset_=UTF-8") > > Netscape Navigator 6.2.2 does not > > Opera 7.11 does not. > > Gee. Netscape 6.2.2 (based on pre-1.0 Mozilla) sounds like an ancient relic. > To be fair, you may wish to try Mozilla 1.4 or the latest netscape based on > Mozilla 1.4. > > > I conclude that the '_charset_' mechanism, although ingenious, is a > > non-standard, proprietary distraction. > > I agree. > > > The only standards-based way of being _sure_ what character encoding > > has been applied to form data appears to be to use > > > > <form action=... method='post' accept-charset='UTF-8'> > > > > (or whatever the page author's chosen character encoding is). > > I don't know what Lynx and w3m-m17n do in this case. They're important > for some people with accessibility problems. I'll check them out. > That would be useful, thanks > > > Interestingly, changing from 'post' to 'get' in MSIE6 re-enables the > > user's control over the encoding used (i.e. over the values now > > transmitted in the URI query string). I have not tested this with the > > other two browsers. > > It also depends on whether or not you set 'send URLs always in UTF-8' in > Tools|Options(?) in MS IE. > True, but I'm trying to find a 'reliable' mechanism which is not dependent on user-accessible controls. IMHO, this is also a 'dangerous' option, in that it goes agains the de facto conventions and anticipates (parhaps incorrectly) the recommendations of the proposed IRI RFC. It can only safely be used with a 'consenting' server site. > > > The only other reliable transfer mechanism available would appear to > > be the ENCTYPE="multipart/form-data" method being discussed in this > > same thread, but this format is not decoded by standard Servlet > > containers, so the convenient HttpRequest.getParameter() Servlet API > > could not be used with this mechanism. > > Can this API be updated to decipher charset parameter present in > C-T header fields of subparts of multipart/form-data? HTML 4.x was > released in 1999(?) and ..... > I don't see why not. It would seem to be architecturally possible. Sun's Servlet spec. 2.3 (and final draft 3 of version 2.4) say that only application/x-www-form-urlencoded is to be used as a source of parameters when POST is used (sect 4.1.1). It seems to be illogical that, from a page designer's point of view, there are three different ways of returning the same data, yet Servlet containers only support the two which use query-string encoding - and the one they don't support is the only one that carries with it an unambiguouis indication of the character encoding used!. It's probably too late in the 2.4 cycle to propose this change. Maybe someone should do so for 2.5. > > Jungshik > > > Chris
Received on Saturday, 26 July 2003 07:25:00 UTC