- From: Addison Phillips [wM] <aphillips@webmethods.com>
- Date: Fri, 16 Jul 2004 14:55:10 -0700
- To: "Jungshik Shin" <jshin@i18nl10n.com>, <www-international@w3.org>
Did you set the locale via a taglib directive? That'll do it every time. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4880792 and others. In fact, if you call request.setLocale you are setting the encoding and it overrides contentType, etc. You don't have to do it directly in the page either: if you do it in a taglib you'll find that problem. Sun has fixed this in the latest-and-greatest version: now if you set contentType, that takes precedence over setLocale. Addison Addison P. Phillips Director, Globalization Architecture webMethods | Delivering Global Business Visibility http://www.webMethods.com Chair, W3C Internationalization (I18N) Working Group Chair, W3C-I18N-WG, Web Services Task Force http://www.w3.org/International Internationalization is an architecture. It is not a feature. > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org]On Behalf Of Jungshik Shin > Sent: 2004年7月16日 14:25 > To: www-international@w3.org > Subject: JSP page directive contentType overriden by Apache tomcat? > > > > Hi, > > I've been wrestling with a mysterious problem for the last few hours. I > made a patch to > the web search front-end of 'Nutch' (http://www.nutch.org an open > source search engine > that strives be an open source google [1]) so that query strings made of > characters outside ISO-8859-1 > character repertoire can work. > > Following the standard-step of adding contentType and pageEncoding > directives at the beginning > of jsp files (I also added request.setCharacterEncoding("UTF-8"); along > with making sure that > that's honored because recent versions of Apache tomcat by default > ignores that for GET), > I expected everything to work. To my great surprise, all the JSP > files with > 'contentType="text/html; charset=UTF-8"' directive still emit > 'Content-Type:text/html; charset=ISO-8859-1' > in HTTP header. Even more surprsing is that cached versions of > translated java source files for > those jsp files have the following line: > > response.setContenttype("text/html; charset=UTF-8"); > > It's completely beyond me how I've been getting 'text/html; > charset=ISO-8859-1' despite that. > > You can try it at http://pippin.kaist.ac.kr:8080. I ran nutch crawler to > fetch a small number (about > 4000) of pages in several different scripts (if you give '1234' as a > query, you'll get 4 hits). The > search result page(handled by search.jsp) is supposed to be in UTF-8 > with the correct C-T header > emitted in HTTP header. > > Is there anyone who's been beaten by this bizzare problem? It'd be great > to know how that was solved. > > Thank you tons in advance, > > Jungshik > > > [1] Needless to say, there are a number of things to improve in I18N as > well as in other aspects before Nutch can compete with Google. > >
Received on Friday, 16 July 2004 17:58:11 UTC