- From: Michael Gorelik <mgorelik@Novarra.com>
- Date: Tue, 14 Aug 2001 15:35:15 -0500
- To: www-international@w3.org
We work on the product that enables wireless devices to access any web content on the fly. I am in process of evaluating options to enable multi-language support, and honestly I am ready to jump out of the window:-) I have several questions, hopefully some one can help me to sort them out:-) 1)Have any one seeing some information on the amount of pages, % of content available in different charsets, such as ISO8859-1, UTF-8, UTF-16, EUC-JP, ISO-2022-JP, ShiftJs,etc (except the Babel study). I am trying to get idea on the number of users of the particular charset. 2)Also, if some one can point out a nice table that list languages, character repertoire, coded character set, charset, I would be very grateful. Something like this: Language Character Repertoire Coded Character Set charset English ISO8859-1 ISO8859-1 ISO8859-1 Japanese JIS X 0208-1990 shift jis shift-jis iso-2022-jp iso-2022-jp etc. Of course I am still at a loss which standard defines character repertoire, which defines, coded character set, which one defines encoding, and which one defines charset. 3) Probably my most important dilemma is - Can we use Unicode to represent data internally. Namely is there mapping tables from all the most widely used charsets in Europe and East Asia into Unicode and back??? If there are widely used encodings that don't map into Unicode nicely, what are they? 4) What is the set of IANA charsets for CJKV that I need to be able to handle in my product to lets say support 80-90% of content available in Asia? Thanx in advance:-) Misha Gorelik *;O)
Received on Tuesday, 14 August 2001 16:40:07 UTC