- From: Anthony Fok <anthony@thizlinux.com>
- Date: Fri, 09 Nov 2001 14:49:47 +0800
- To: ietf-charsets@iana.org
- Cc: Kevin Lau <kevin@thizlinux.com>, Fai <fai@thizlinux.com>, Bruno Haible <haible@ilog.fr>, James Su <suzhe@turbolinux.com.cn>, Shouhua Wang <shwang@sonata.iscas.ac.cn>, Jian Wu <jwu@sonata.iscas.ac.cn>, Leon Zhang <leon@xteamlinux.com.cn>, Anthony Fok <anthony@thizlinux.com>, Yu Guanghui <ygh@dlut.edu.cn>, Roger So <roger.so@sw-linux.com>, Pablo Saratxaga <pablo@mandrakesoft.com>, zhaoway <zw@debian.org>, Yu Mingjian <yumingjian@china.com>, Chen Xiangyang <chenxy@sun.ihep.ac.cn>, Dirk Meyer <dmeyer@adobe.com>, Markus Scherer <markus.scherer@jtcsv.com>, Ken Lunde <lunde@adobe.com>, li18nux2000@li18nux.org, bsd-locale@haun.org
Hello all, I hereby propose the inclusion of GBK and GB18030 charsets in the IANA charset registry. (Hope you don't mind all the CCs. I think it would be nice if all the GB18030 experts can comment and contribute to this registration as a community effort. :-) GB2312 (1980) has been superceded by GBK (circa 1993?) and GB18030 (2000). GBK has been widely used by mainland Chinese for a very long time, and GB18030, which supercedes GBK, is a mandatory standard in Mainland China August 30, 2001. GBK extends GB2312 to include the CJK compatibility area defined in Unicode 2.1. GBK quickly became very popular in China. All major GNU/Linux and UNIX platforms (Red Flag, XteamLinux, Turbolinux, BluePoint, COSIX, etc.), as well as Microsoft Windows, have supported GBK for years. It is equivalent to codepage 936 in Windows. Many web pages already use GBK encoding. For example, the character "Rong" in Premier Zhu Rongji's is missing from GB2312 and can be displayed only in GBK. GB18030 further extends GBK. It covers 1-byte, 2-byte and 4-byte codepoints while maintaining full backward compatibility with GB2312 and GBK. It specifies a roundtrip conversion to and from Unicode/ISO-10646-1, and the 4-byte portion of GB18030 is calculated algorithmatically to map to corresponding codepoints in Unicode/ISO-10646-1. Thus, this will be the first Chinese national standard that covers all ethnic languages (Chinese, Tibetan, Mongolian, etc.) used in China. On behalf of fellow Chinese, I would really love to see GBK and GB18030 recognized as official charsets by IANA. Indeed, zh_CN.GBK and and zh_CN.GB18030 have been supported in glibc and GNU iconv for quite some time. Patches to add GB18030 support exist for XFree86 4.1.x and Qt-2.3.x. Also, Mozilla, Netscape and MSIE already also recognize and support both GBK and GB18030. A test page (courtesy of James Su) is at: http://www.turbolinux.com.cn/~suzhe/ This page (Content-Type: text/html; charset=gb18030) can be displayed in full under Turbolinux 7.0 and XteamLinux 4.0. After installing the GB18030 upgrade by Microsoft, the page also displays correctly under Windows NT/2000/XP, albeit with some fonts missing as the font provided by Microsoft isn't as complete. ;-) Note that even in e-mail and webpages where GBK is used, Microsoft still calls it "charset=gb2312". It is a misnomer, but perhaps a compromise to maintain backward compatibility, and perhaps because GBK isn't yet in IANA. There are also some people who may have used "x-gbk" and "x-x-gbk", but of course, that is also non-standard. It would be best if we standardize this to "GBK" once and for all. Afterall, GBK is a Chinese national specification used by millions if not billions of people. It is not some private vendor implementation, so the use of "x-" is inappropriate. :-) I am very glad that BIG5-HKSCS has been registered. It would be wonderful if we could get GBK and GB18030 registered too. :-) My question: How should we proceed? The Chinese government has published a printed standard for GB18030. It is in Chinese, and is unfortunately not available on-line. However, an unofficial yet authorative GB18030 Summary written by Dirk Meyer at Adobe Systems is at: ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf The complete GB18030 <-> Unicode conversion data (mappings and ranges) are defined here: http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml Markus Scherer at IBM has also written some excellent documentation: http://oss.software.ibm.com/cvs/icu/~checkout~/charset/source/gb18030/gb18030.html Fellow Chinese developers like James Su, Wang Shouhua, Wu Jian, Leon Zhang, etc. have also posted some GB18030 papers in Chinese on the Internet. I am not sure how to contact the official Chinese standard committee who defined the GB18030 standard, but I am sure some of you may know. :-) I just found a copy of the Big5-HKSCS registration on-line. I guess we can use that as a template, and follow RFC 2278 to write a formal application for GBK and GB18030 (in ASCII) and submit it. BTW, that registration is at: http://lists.w3.org/Archives/Public/ietf-charsets/2000OctDec/att-0000/01-Submisson_to_IANA.txt (Yes, there is a small typo: Submisson instead of Submission. :-) Any comments, suggestions and guidance are welcome! :-) Best Regards, Anthony Fok -- Anthony Fok Tung-Ling ThizLinux Laboratory <anthony@thizlinux.com> http://www.thizlinux.com/ Debian Chinese Project <foka@debian.org> http://www.debian.org/intl/zh/ Come visit Our Lady of Victory Camp! http://www.olvc.ab.ca/
Received on Friday, 9 November 2001 01:47:43 UTC