Re: Servlet question from Thierry Sourbier on 2001-10-19 (www-international@w3.org from October to December 2001)

From: Thierry Sourbier <webmaster@i18ngurus.com>
Date: Fri, 19 Oct 2001 15:29:24 +0200
To: <www-international@w3.org>
Message-ID: <007101c158a2$126c05a0$4fadfea9@dell400>

Sourav,

> The problem is setContentType works fine when the string you are
> printing out is an Unicode string.

Well, all strings are assumed to be Unicode in Java, that's a feature and
the big difference with a byte stream. If you have non-Unicode strings that
means that you didn't read them with the proper encoding and the problem
would better be solved there.

> Where as if content type is specified through meta tag what I found even
> the non unicode string is displayed properly. I don't know how it works.

Well it is a case where 2 mistakes compensate one another :). You are
relying on the default encoding for both the input and output when your data
obviously is using a different encoding. This works fine only as your
default encoding is likely a single byte with no invalid values (e.g.
CP1252). Yet be aware that you can't manipulate the string in your Java code
as you may corrupt/lose the data because Java does not know anymore what is
a character (e.g. just try to do a character count...).  As you've already
discovered it when you tried to use the setContentType API, your code will
also quickly become a maintenance nightmare. In a multilingual environement,
it is therefore best to specify the encoding used for any input/output even
if in some case (like yours) it seems to work fine if you don't.

Regards,
Thierry

Received on Friday, 19 October 2001 09:55:16 UTC