- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 15 Oct 2002 17:03:00 +0900
- To: Elliotte Rusty Harold <elharo@metalab.unc.edu>, <www-tag@w3.org>
- Cc: xml-names-editor@w3.org, www-international@w3.org
At 09:03 02/10/15 +0900, Martin Duerst wrote: >On the other hand, I would also be very glad to have actual code, in >different programming languages, on the W3C web site, or in CVS. >If you or anybody else has something that they can contribute as >a starting point, that would be great. I have just tweaked an older program of mine to produce a very simple program that checks whether the Java implementation one uses correctly converts Strings to UTF-8. It is below. Put it in a file called checkUTF8.java, then run the following commands: > javac checkUTF8.java > java checkUTF8 The program will tell you. Caveat: I haven't been able to test this on an installation that actually says "Sorry,...". If somebody has a 1.2 (or 1.3?) around, please try and mail the result and any problems directly to me. If you think that putting up such code at http://www.w3.org/International/iri-edit/ or somewhere close, also please tell me privately. Regards, Martin. /* * The checkUTF8 class checks for correct conversion from * UTF-16 to UTF-8. The example is a single Old Italic character. */ import java.io.*; class checkUTF8 { public static void main(String[] args) { String example = new String("\ud800\udf00"); int length = 0; try { length = example.getBytes("utf-8").length; } catch (UnsupportedEncodingException e) { System.out.println("Sorry, encodig 'utf-8' not supported, " + "test failed."); } if (length == 4) { System.out.println("Congratulations, your Java " + "implementation converts Strings correctly to UTF-8, " + "using 4 bytes for each character outside the Basic " + "Multilingual Plane (BMP)."); } else if (length == 6) { System.out.println("Sorry, but your Java implementation " + "does not convert Strings correctly to UTF-8; in " + "particular, it uses 6 bytes instead of 4 bytes for " + "each character outside the Basic Multilingual Plane " + "(BMP)."); } else { System.out.println("Internal error, undefined result."); } } }
Received on Tuesday, 15 October 2002 04:03:51 UTC