- From: Carl W. Brown <cbrown@xnetinc.com>
- Date: Sun, 12 Aug 2001 16:01:43 -0700
- To: "Martin Duerst" <duerst@w3.org>, "Bjoern Hoehrmann" <derhoermi@gmx.net>, <www-international@w3.org>
Martin, They now call it International Components for Unicode. http://oss.software.ibm.com/icu/ Good point about supporting Unicode 3.1. All new Unicode implementations should. The problem with ICU is that it is not small. I have been working with it for a year and a half now and it is a great product. But if you are going to use the unorm and normalizer code then you also need the uchar code for character properties and to load the tables you need udata and resbund and of course everyone needs putil etc. If you pull out the code you don't need you still have large Unicode character property tables. By the time you are through you won't have a small piece of code. I think the best approach considering that he only needs the ICU common routine DSO/DLL and data DLL is that he can ship them with the pre compiled code. People wishing to compile from source will have to install ICU themselves at least on most Unix platforms. I like ICU not only because it is open source but that it is the best product available on the market. In fact I am so impressed that I have dedicated more than 5 man months of work contributing internal code to ICU and creating open source code to help people migrate to Unicode using ICU. I hope that this will help people move to Unicode. My code (xIUA) is not for all software and this product would probably not benefit from xIUA. http://www.xnetinc.com/xiua/ But it would benefit from ICU. In addition to normalization it could probably use the conversion routines. Working with HTML I expect that Tidy will have to deal with UTF-8 and various code pages. You can open a converter for the specific code page that the user specifies and you can look at the xIUA code as an example to see how easy it is to get ICU to return the MIME code page name for the code page that you are using with standard ICU calls. This way if the user specifies a valid but non-standard code page name you could convert it to the MIME standard name. For example "cp1252" would become "windows-1252". Carl > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org]On Behalf Of Martin Duerst > Sent: Saturday, August 11, 2001 1:23 AM > To: Bjoern Hoehrmann; www-international@w3.org > Subject: Re: Free C implementation of form C > > > Bjoern - Please check out ICU (IBM Classes for Unicode, I guess). > > And please make that Unicode Version 3.1, there is a small but > important bug fix in 3.1. > > Regards, Martin. > > At 08:09 01/08/11 +0200, Bjoern Hoehrmann wrote: > >Hi, > > > > Is there any free and tiny ANSI C implementation of Unicode > >Normalization Form C out there? I want to implement the Early > >Uniform Normalization as in [1] in HTML Tidy [2] and such an > >implementation would be very helpful. It should be based on > >Unicode 3.0. It should come free-standing with optimised > >Unicode data and hopefully act on either int[] or char*s UTF-8 > >encoded. > > > >[1] http://www.w3.org/TR/charmod/#sec-Normalization > >[2] http://sourceforge.net/projects/tidy > > > >TIA, > >-- > >Bj$B‹S(Bn H$B‹I(Brmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de > >am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de > >25899 Dageb$B!&(Bl { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/ >
Received on Sunday, 12 August 2001 19:01:47 UTC