W3C home > Mailing lists > Public > www-international@w3.org > July to September 2001

Re: UTF-7 and java

From: Thierry Sourbier <webmaster@i18ngurus.com>
Date: Fri, 24 Aug 2001 16:10:11 +0200
Message-ID: <003f01c12ca6$7dd6b1a0$49b2fea9@dell400>
To: "Khurram Ilyas" <kilyas@hotmail.com>, <www-international@w3.org>
I don't know why UTF-7 is not included as a supported encoding in Java. I
don't think that UTF-7 has been updated to support efficiently characters
outside of Plan zero (the RFC 2152 assumes 16 bits characters), but I never
had the opportunity to use UTF-7 may be that just a wrong guess...

> Are UTF-7 and ASCII charactersets almost the same?
Don't confuse encoding and character set. To simplify things UTF-7 is indeed
designed to make use of only the mail safe ASCII values, but the trade off
is that several values are often needed to form one character while each
value represents a character in ASCII. UTF-7 allows you to represent any
Unicode character while ASCII stops at the first 128. If you ever attempt in
Java to convert a string in ASCII all characters above 128 (that's pretty
all non-English characters) will be replaced with a '?', I guess you don't
want to go there :).

I unfortunatelly think you'll need to bite the bullet and write the code to
transform a string into a UTF-7 byte stream. There are some C code samples
available online:

http://www.unicode.org/Public/PROGRAMS/CVTUTF/CVTUTF7.C (note that Unicode
does consider it as obsolete :(.
http://czyborra.com/utf/

Cheers,
Thierry.

----------------------------------------------------------------------------
-------------------------------
www.i18ngurus.com - Open Internationalization Resources Directory


----- Original Message -----
From: Khurram Ilyas
To: www-international@w3.org
Sent: Friday, August 24, 2001 2:08 PM
Subject: UTF-7 and java


Hi,
I was facing a problem while dealing with the i18n issues in regard to
UTF-7. I was working on conversion to and from UTF-7 character set using
java. However it seems that UTF-7 is not one of the supported encodings for
java. The ByteToChar class for UTF-7 also seems to be missing in i18n.jar.
Are UTF-7 and ASCII charactersets almost the same?  Or is their any reason
for not including it in java.  Plus are there any work arounds.
In case you have any advice as to how to deal with the issue please let me
know.
Thanx in advance.

Best Regards,
Khurram Ilyas Chaudhry



Get your FREE download of MSN Explorer at http://explorer.msn.com
Received on Friday, 24 August 2001 11:45:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:57 GMT