- From: A. Vine <avine@eng.sun.com>
- Date: Thu, 16 Aug 2001 18:14:45 -0700
- To: "Carl W. Brown" <cbrown@xnetinc.com>
- Cc: www-international@w3.org
Carl, Interesting response to a suggestion which is not unreasonable nor far-fetched. Comments imbedded: "Carl W. Brown" wrote: > > Andrea, > > > -----Original Message----- > > From: www-international-request@w3.org > > [mailto:www-international-request@w3.org]On Behalf Of A. Vine > > Sent: Wednesday, August 15, 2001 10:40 AM > > To: www-international@w3.org > > Subject: Re: Unicode <-> CJKV national encoding; supporting > > multi-lingual webcontent > > > > > > > > "Carl W. Brown" wrote: > > > > > > The only way to sanely implement a multi-lingual site is using > > Unicode. The > > > best support for Unicode is ICU. > > http://oss.software.ibm.com/icu/ If you > > > > or Java :-) > > Java has it own set of problems (challenges). First many people already > have code written in C that they do not want to rewrite. No mention was made of whether the code was in C or anything else. I was simply suggesting Java. > It is not as easy > to actually get to the underlying Unicode in Java. Please give an example. > C code usually runs > faster. Then there is the problem of JVM versions and conflicting support. Conflicting support? > > ICU started with the Java Unicode support and adapted it for C/C++ > applications. http://oss.software.ibm.com/icu/ You will notice that many of > the functions are very Javaesque. Yup, we here use some of the older code written by Netscape and Taligent before ICU was created as a C/C++ parallel to the Java functionality. Of course, there have been updates. > > It is a great component library for Unicode. What I have added is extra > functions that don't really belong in ICU proper. xIUA unlike ICU is > designed as a sample starting point for code that you develop as part of > your application. http://www.xnetinc.com/xiua/ While it is designed for > typical applications it is especially useful for web server applications. > For example is adds support for a per-thread set of locales. You can have > one locale for the browser, one for your HTML pages and one for Your Unicode > database. > > You can make calls to transform your data from your HTML charset which may > be EUC-JP to your browser charset which is Shift_JIS. The same code may > convert the same page to UTF-8 for the next browser. If you are parsing > data your code can call xiua_strtok and the same call will work for UTF-32, > UTF-16, UTF-8 and code page data. Unlike the normal strtok it is also > thread safe. > > It manages your locale information including time zones using Java style > time zones. It also has special web functions. For example it will analyze > a browser accept language string including the q= quality selections and > return the first choice language based on the installed ICU locales. It > will also analyze a path and return any RFC 3066 language subdirectory name > that is found to match your ICU installed locales. > > It also has special migration aids. It has a routine for example. that will > convert a strftime date time format to an ICU format using the ICU values > from its resource bundles. > > It also makes conversion easier because like Java you don't have to pass the > locale to every function that may invoke ICU so that you don't have to > change any existing APIs to convert to Unicode. Great. Keep up the good work. For those writing in Java (apparently a majority of coders worldwide, according to a recent study), you may have to write this stuff on your own, or find a compatible Java library. I wouldn't say that you had to scrap all your Java code, though. > > This code is really a starting point for users. It is designed to be > customized by users. It also has alternative functions. For example I have > xiua_strcoll that most C programmers can relate to but since the first > implementation was in a special version of PHP, I also have: > > int32_t /* 1 = TRUE, 0 = FALSE, -1 = LOGIC ERROR */ > xiua_Collate(char *str1, /* string 1 */ > char * option, /* option string contains both comparison test */ > /* and optional collation strength parameters */ > /* "==" "<=" ">=" "!=" "<" ">" are the */ > /* comparison test values and "?" ":" "#" are */ > /* the valid strength codes. "==?" is a test */ > /* for equal primary strength. */ > /* ? = Primary letters match no case or case */ > /* e.g "Black-bird" ==? "blackbird" */ > /* but what consitiute separate letters may differ */ > /* by locale e.g. Spanish ch ll */ > /* Secondary case insensitive normalized with accents */ > /* : = Tertiary above plus case sensitive */ > /* # = Strict match */ > /* spaces are ignored, non-standard conditions are */ > /* supported "!<>" or "=" are the same as "==" */ > /* "" or "!" however are illogical and are errors */ > char * str2); /* string 2 */ > > Because the result is a TRUE/FALSE it is easy to embed the result into a > more complex test or a regular expression. > > It also has i18n useful functions such as xiua_strncpyEx. It works somewhat > like strncpy except that it always adds a null to the end of the target > string and only copies full characters. So if it is copying UTF-8 or > Shift_JIS data you will always get full characters copied even if it means > that the target buffer is not quite full. It also always adds a null to the > end which is 4 bytes if it is UTF-32 data. To make it easier to use it > returns the data length copied. > > > > > Also, if you're generating HTML forms, I recommend you take a look at: > > > > http://www.unicode.org/iuc/iuc17/papers.html > > Good presentation. Born of 2 painful projects, one in Java, one in C. Somewhat out of date but the same considerations still hold. > One thing that my software does is help overcome some of > the problems with internationalization that you have with the Apache web > server that you don't have with servers like iPlanet but yet stay iPlanet > compatible. > > I think that adding language as a type is an administrative nightmare. > www.mysite.com/dir/subdir/mypage.html.en is a bad idea. If nothing else it > creates problems keeping track of links. It is also a maintenance > nightmare. If I also have www.mysite.com/dir/subdir/mypage.html.jp with is > encoded in sjis mixed in with pages in other encodings. It is easy to get > your hands crossed. Hmm, I never remember saying that. I assume you're just adding this as advice to, er, was it Misha? Andrea > > > > > Look at the presentation under Session A2 (mine ;-) and it looks like the > > presentation under A3 might have some useful information (David Taieb's). > > > > Regards, > > Andrea > > > > Carl
Received on Thursday, 16 August 2001 21:16:02 UTC