- From: Carl W. Brown <cbrown@xnetinc.com>
- Date: Thu, 23 Aug 2001 11:31:24 -0700
- To: <www-international@w3.org>
Eric, > >Every page will still be translated into different language but > > you only have one encoding. > > > > this is the Unicode dream and maybe one day we'll see something like > it...the actual process at present goes like this...you upload your > lovely Unicode Japanese site only to find that most Japanese users > can't access it, they can only access shift-xjis encoded sites...you > then move on to Russian to discover that most Russian users are > expecting the language to be encoded with Windows 1251...and don't > get me on to Chinese > > Unicode is utterly wonderful...I love the idea to death...the ethos > is truly inspiring...the practicality is that Russia, Japan and Hong > Kong got online before Unicode began...the people of those nations > will take some shifting from their current methods of representing > their languages I have put together a solution. Yes there are a lot of browsers out there that do not support Unicode. Take the case that you have Japanese web pages encoded in EUC-JP, your database uses UTF-8 for Unicode and your browser is using Shift_JIS. You set up a locale for your pages ("ja_JP.EUC-JP") , another for your database and another for the browser ("ja_JP.Shift_JIS"). They are thread independent locales. This program adapts so that you can do a xiua_strcoll and it will compare two strings using the Japanese collating order with UTF-32, UTF-16, UTF-8 or code page data. It can dynamically switch between different data formats and produce the same result. If you do an xiua_strcmp it will also produce the same results for different Unicode encodings. So it will adapt to different platforms that use different Unicode encodings. It will also transform the data from one encoding to another. So if you have a UTF-8 locale and convert it to a Shift_JIS locale it will take care of converting the data. If the browser is using EUC-JP encoding and the HTML is in EUC-JP it will see that the two locales are using the same encoding so that it will just copy the data. The whole thing however, works on a Unicode base. It uses ICU http://oss.software.ibm.com/icu/ which is probably the most comprehensive Unicode support package for C/C++ applications. Some functions like xiua_strtok must have different implementations for different forms of data but it is transparent to the user. My code xIUA http://www.xnetinc.com/xiua/ provides these alternate implementations as needed. Both ICU and xIUA are free open source code so they can be tailored for your specific needs. In fact xIUA is starter package that is designed to be part of your application so you can adapt it for your needs. It also contains code they you can use with an Apache web server to organize your web pages into language specific directories so that it is easier to organize the site and make links. This also reduces mishaps because each directory uses the same code page. Better yet you can convert all your files to UTF-8 and then just translate to the code page that the browser needs. As more browsers start supporting UTF-8 the translation will become unnecessary. > > and if I wish to mix languages on a single page...if I wish to use a > German or French quote in a passage of English text?...I like the > broad idea...however over-automisation of language seems to be > disastrous...people are strange about language...you can see it on > our site where users seem to leap between languages at particular > points...a lot of people seem to have different preferred languages > for collecting information and for dealing with personal matters I browse pages with mixed script all the time. Most people want to see only one language but the site must support multiple languages. Carl
Received on Thursday, 23 August 2001 14:31:25 UTC