- From: Anthony Fok <anthony@thizlinux.com>
- Date: Thu, 14 Mar 2002 19:11:42 +0800
- To: ietf-charsets@iana.org
- Cc: 陈壮 <chenzh@cesi.ac.cn>, Cheng XU <xucheng@cn.ibm.com>, haible@ilog.fr, suzhe@gnuchina.org, shwang@sonata.iscas.ac.cn, 吴健 <jwu@sonata.iscas.ac.cn>, leon@xteamlinux.com.cn, ygh@dlut.edu.cn, roger.so@sw-linux.com, pablo@mandrakesoft.com, zw@debian.org, yumingjian@china.com, chenxy@sun.ihep.ac.cn, Dirk Meyer <dmeyer@adobe.com>, markus.scherer@jtcsv.com, Ken Lunde <lunde@adobe.com>, li18nux2000@li18nux.org, bsd-locale@haun.org, wuzg@cesi.ac.cn, Yoshihiko Enomoto <YENOMOTO@jp.ibm.com>, Jack Kang <Jack.Kang@sun.com>
Application of IANA Charset Registration for GB18030 ---------------------------------------------------- Charset name: GB18030 Charset aliases: Currently none. Suitability for use in MIME text: Yes Published specification(s): The official GB 18030-2000 standard was published (in print) by the China Standard Press (中国标准出版社 Zhongguo Biaozhun Chubanshe), Beijing, March 17, 2000: Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange -- Extension for the basic set (信息枝术 -- 信息交换用汉字编码字符集 -- 基本集的扩充 Xinxi Jishu -- Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong) The mapping tables therein has been updated in late 2000 to correct the mapping of the "Euro" character and to exclude the surrogate area. Dirk Meyer <dmeyer@adobe.com> (Adobe Systems) has kindly provided an English summary, explanations, and remarks of the GB 18030-2000 standard on-line (February 16, 2001): ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf Markus Scherer <markus.scherer@us.ibm.com> (IBM) also published "GB 18030: A mega-codepage: Exploring the history and structure of the new Chinese Unicode standard" on-line (February 2001): http://oss.software.ibm.com/icu/docs/papers/gb18030.html ISO 10646 equivalency table: Markus Scherer (IBM) et al. have prepared an authorative GB18030 and ISO 10646 mapping table with the latest revisions in CharMapML (XML) format (ref. Unicode Technical Report #22). It is available on-line at: http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml Additional information: To facilitate electronic communication in People's Republic of China, and to provide a smooth migration path from the older GB 2312-1980 standard and GBK (1995) specification to ISO 10646 / Unicode / GB 13000.1, the Chinese government published the GB 18030-2000 standard, which is code- and character- compatible with the full codespace of ISO 10646 / Unicode standards from U+0000 to U+10FFFF. GB18030 support is mandatory for all operating systems sold in Mainland China on or after September 1, 2001. (Embedded systems and PDAs are currently exempt.) Eventually, end-user applications must also fully support the GB18030 standard--mere UTF-8 support is not enough. Although this mandatory statute may seem too strict, it is a smart move to solve a pressing Chinese text communication issue once and for all, while providing backward compatibility to legacy GB2312/GBK systems. Therefore, it is important for all developers to learn and implement this standard esp. if they intend to sell their software in Mainland China. The current GB18030 standard specifies the addition of CJK Extension A, and ethnic minority languages Mongolian, Tibetan, Uyghur (Arabic) and Yi. Since GB18030 is fully ISO 10646 compatible, support for CJK Extension B and all other languages will be easy. More importantly, the GB18030 standard means that special Chinese characters in people's names, geographic names and ancient documents may finally be processed. In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8 maintains compatibility with ASCII, GB18030 maintains compatibility with GB2312/GBK and provides full ISO 10646 compatibility. Part of the mapping is from a lookup table (similar to GBK). The rest is all calculated algorithmically. A brief summary of the GB18030 codepoints is listed below: 1-byte: {00-7F} Same as US-ASCII / ISO 646 IRV (1991) 2-byte: {81-FE}{40-7E,80-FE} Same as GBK (But now only 1-to-1 mappings remain) 4-byte: {81-FE}{30-39}{81-FE}{30-39} Maps linearly to ISO 10646 starting from GB+81308130 = U+0080 while skipping the mappings already defined in the 1-byte and 2-byte areas. The surrogate area is excluded. More information on the GB18030 standard and sample implementations may be found on the Internet. Person & email address to contact for further information: CHEN Zhuang (陈壮) chenzh@cesi.ac.cn Chinese IT Standardization Technical Committee Chinese Electronics Standardization Institute Additionally, please Cc: ietf-charsets@iana.org to keep the community informed, as the implementation of the GB18030 standard on operating systems and applications is a community effort. Intended usage: COMMON Acknowledgement: Appreciations and kudos to the Internet community for documenting and explaining the GB 18030-2000 standard to the whole world; for implementing this new standard in software so quickly; and for their comments to this registration. Special thanks to Dirk Meyer <dmeyer@adobe.com> for his translation of the GB18030 standard, and to Markus Scherer <markus.scherer@us.ibm.com> for his GB18030/Unicode mapping table. -- Anthony Fok <anthony@thizlinux.com>, March 14, 2002 -- Anthony Fok Tung-Ling ThizLinux Laboratory <anthony@thizlinux.com> http://www.thizlinux.com/ Debian Chinese Project <foka@debian.org> http://www.debian.org/intl/zh/ Come visit Our Lady of Victory Camp! http://www.olvc.ab.ca/
Received on Thursday, 14 March 2002 06:04:10 UTC