- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 15 Oct 2002 17:03:00 +0900
- To: Elliotte Rusty Harold <elharo@metalab.unc.edu>, <www-tag@w3.org>
- Cc: xml-names-editor@w3.org, www-international@w3.org
At 09:03 02/10/15 +0900, Martin Duerst wrote:
>On the other hand, I would also be very glad to have actual code, in
>different programming languages, on the W3C web site, or in CVS.
>If you or anybody else has something that they can contribute as
>a starting point, that would be great.
I have just tweaked an older program of mine to produce a very
simple program that checks whether the Java implementation one uses
correctly converts Strings to UTF-8.
It is below. Put it in a file called checkUTF8.java, then
run the following commands:
> javac checkUTF8.java
> java checkUTF8
The program will tell you. Caveat: I haven't been
able to test this on an installation that actually
says "Sorry,...". If somebody has a 1.2 (or 1.3?)
around, please try and mail the result and any
problems directly to me.
If you think that putting up such code at
http://www.w3.org/International/iri-edit/ or somewhere
close, also please tell me privately.
Regards, Martin.
/*
* The checkUTF8 class checks for correct conversion from
* UTF-16 to UTF-8. The example is a single Old Italic character.
*/
import java.io.*;
class checkUTF8 {
public static void main(String[] args) {
String example = new String("\ud800\udf00");
int length = 0;
try {
length = example.getBytes("utf-8").length;
} catch (UnsupportedEncodingException e) {
System.out.println("Sorry, encodig 'utf-8' not supported, " +
"test failed.");
}
if (length == 4) {
System.out.println("Congratulations, your Java " +
"implementation converts Strings correctly to UTF-8, " +
"using 4 bytes for each character outside the Basic " +
"Multilingual Plane (BMP).");
}
else if (length == 6) {
System.out.println("Sorry, but your Java implementation " +
"does not convert Strings correctly to UTF-8; in " +
"particular, it uses 6 bytes instead of 4 bytes for " +
"each character outside the Basic Multilingual Plane " +
"(BMP).");
}
else {
System.out.println("Internal error, undefined result.");
}
}
}
Received on Tuesday, 15 October 2002 04:03:51 UTC