W3C home > Mailing lists > Public > www-international@w3.org > October to December 2002

Re: IRIs everywhere (including XML namespaces)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 15 Oct 2002 17:03:00 +0900
Message-Id: <>
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>, <www-tag@w3.org>
Cc: xml-names-editor@w3.org, www-international@w3.org

At 09:03 02/10/15 +0900, Martin Duerst wrote:

>On the other hand, I would also be very glad to have actual code, in
>different programming languages, on the W3C web site, or in CVS.
>If you or anybody else has something that they can contribute as
>a starting point, that would be great.

I have just tweaked an older program of mine to produce a very
simple program that checks whether the Java implementation one uses
correctly converts Strings to UTF-8.

It is below. Put it in a file called checkUTF8.java, then
run the following commands:

 > javac checkUTF8.java
 > java checkUTF8

The program will tell you. Caveat: I haven't been
able to test this on an installation that actually
says "Sorry,...". If somebody has a 1.2 (or 1.3?)
around, please try and mail the result and any
problems directly to me.

If you think that putting up such code at
http://www.w3.org/International/iri-edit/ or somewhere
close, also please tell me privately.

Regards,     Martin.

  * The checkUTF8 class checks for correct conversion from
  * UTF-16 to UTF-8. The example is a single Old Italic character.

import java.io.*;

class checkUTF8 {
     public static void main(String[] args) {
         String example = new String("\ud800\udf00");
         int length = 0;
         try {
             length = example.getBytes("utf-8").length;
         } catch (UnsupportedEncodingException e) {
             System.out.println("Sorry, encodig 'utf-8' not supported, " +
                                "test failed.");

         if (length == 4) {
             System.out.println("Congratulations, your Java " +
                 "implementation converts Strings correctly to UTF-8, " +
                 "using 4 bytes for each character outside the Basic " +
                 "Multilingual Plane (BMP).");
         else if (length == 6) {
             System.out.println("Sorry, but your Java implementation " +
                 "does not convert Strings correctly to UTF-8; in " +
                 "particular, it uses 6 bytes instead of 4 bytes for " +
                 "each character outside the Basic Multilingual Plane " +
         else {
             System.out.println("Internal error, undefined result.");
Received on Tuesday, 15 October 2002 04:03:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:47 UTC