RE: Free C implementation of form C from Carl W. Brown on 2001-08-11 (www-international@w3.org from July to September 2001)

From: Carl W. Brown <cbrown@xnetinc.com>
Date: Sat, 11 Aug 2001 08:07:02 -0700
To: <www-international@w3.org>
Message-ID: <FNEHIHOMIIDPDCIFEJEGEEDHCIAA.cbrown@xnetinc.com>

Bjoern,

You could use ICU.  http://oss.software.ibm.com/icu/ It is free but not
tiny.  The normalization code is written in C++ not C if that matters.
However, if you are working with HTML then users will probably write the
HTML is different encodings.  In which case you will have to translate from
that code page to UTF-16/UTF-32 to normalize because of the Unicode
character property tables used by the normalization routines.

Carl

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of Bjoern Hoehrmann
> Sent: Friday, August 10, 2001 11:09 PM
> To: www-international@w3.org
> Subject: Free C implementation of form C
>
>
> Hi,
>
>    Is there any free and tiny ANSI C implementation of Unicode
> Normalization Form C out there? I want to implement the Early
> Uniform Normalization as in [1] in HTML Tidy [2] and such an
> implementation would be very helpful. It should be based on
> Unicode 3.0. It should come free-standing with optimised
> Unicode data and hopefully act on either int[] or char*s UTF-8
> encoded.
>
> [1] http://www.w3.org/TR/charmod/#sec-Normalization
> [2] http://sourceforge.net/projects/tidy
>
> TIA,
> --
> Björn Höhrmann { mailto:bjoern@hoehrmann.de }
> http://www.bjoernsworldam Badedeich 7 } Telefon:
> +49(0)4667/981028 { http://bjoern.hoehrmann.de
> 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
>

Received on Saturday, 11 August 2001 11:07:04 UTC