Re: Registering GBK and GB18030 in the IANA charset registry

Note from the charset reviewer:

I understand that a new registration form will be sent to this list, 
addressing comments, and that no action from the charset reviewer is 
appropriate yet.

I recommend using the registration form in RFC 2978 as at least part of the 
document. If you want publication of the whole document, the best thing may 
be to publish this as an RFC; this allows for including much more 
explanatory material.
The first step is then to publish the document as an internet-draft.

              Harald T. Alvestrand, charset reviewer


--On 9. november 2001 14:49 +0800 Anthony Fok <anthony@thizlinux.com> wrote:

> Hello all,
>
> I hereby propose the inclusion of GBK and GB18030 charsets in the IANA
> charset registry.
>
> (Hope you don't mind all the CCs.  I think it would be nice if all the
> GB18030 experts can comment and contribute to this registration
> as a community effort.  :-)
>
> GB2312 (1980) has been superceded by GBK (circa 1993?) and GB18030 (2000).
> GBK has been widely used by mainland Chinese for a very long time, and
> GB18030, which supercedes GBK, is a mandatory standard in Mainland China
> August 30, 2001.
>
> GBK extends GB2312 to include the CJK compatibility area defined in
> Unicode 2.1.  GBK quickly became very popular in China. All major
> GNU/Linux and UNIX platforms (Red Flag, XteamLinux, Turbolinux,
> BluePoint, COSIX, etc.), as well as Microsoft Windows, have supported
> GBK for years.  It is equivalent to codepage 936 in Windows.
> Many web pages already use GBK encoding.  For example, the character
> "Rong" in Premier Zhu Rongji's is missing from GB2312 and can be
> displayed only in GBK.
>
> GB18030 further extends GBK.  It covers 1-byte, 2-byte and 4-byte
> codepoints while maintaining full backward compatibility with GB2312
> and GBK.  It specifies a roundtrip conversion to and from
> Unicode/ISO-10646-1, and the 4-byte portion of GB18030 is calculated
> algorithmatically to map to corresponding codepoints in
> Unicode/ISO-10646-1.  Thus, this will be the first Chinese national
> standard that covers all ethnic languages (Chinese, Tibetan, Mongolian,
> etc.) used in China.
>
> On behalf of fellow Chinese, I would really love to see GBK and GB18030
> recognized as official charsets by IANA.
>
> Indeed, zh_CN.GBK and and zh_CN.GB18030 have been supported in glibc
> and GNU iconv for quite some time.  Patches to add GB18030 support
> exist for XFree86 4.1.x and Qt-2.3.x.  Also, Mozilla, Netscape and
> MSIE already also recognize and support both GBK and GB18030.
> A test page (courtesy of James Su) is at:
>
> 	http://www.turbolinux.com.cn/~suzhe/
>
> This page (Content-Type: text/html; charset=gb18030) can be displayed
> in full under Turbolinux 7.0 and XteamLinux 4.0.  After installing the
> GB18030 upgrade by Microsoft, the page also displays correctly under
> Windows NT/2000/XP, albeit with some fonts missing as the font provided
> by Microsoft isn't as complete.  ;-)
>
> Note that even in e-mail and webpages where GBK is used, Microsoft still
> calls it "charset=gb2312".  It is a misnomer, but perhaps a compromise
> to maintain backward compatibility, and perhaps because GBK isn't yet in
> IANA.  There are also some people who may have used "x-gbk" and "x-x-gbk",
> but of course, that is also non-standard.  It would be best if we
> standardize this to "GBK" once and for all.  Afterall, GBK is a Chinese
> national specification used by millions if not billions of people.
> It is not some private vendor implementation, so the use
> of "x-" is inappropriate.  :-)
>
> I am very glad that BIG5-HKSCS has been registered.  It would be
> wonderful if we could get GBK and GB18030 registered too.  :-)
>
> My question: How should we proceed?
>
> The Chinese government has published a printed standard for GB18030.
> It is in Chinese, and is unfortunately not available on-line.
> However, an unofficial yet authorative GB18030 Summary written by
> Dirk Meyer at Adobe Systems is at:
>
>   ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf
>
> The complete GB18030 <-> Unicode conversion data (mappings and ranges)
> are defined here:
>
>
> http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-
> 2000.xml
>
> Markus Scherer at IBM has also written some excellent documentation:
>
>
> http://oss.software.ibm.com/cvs/icu/~checkout~/charset/source/gb18030/gb1
> 8030.html
>
> Fellow Chinese developers like James Su, Wang Shouhua, Wu Jian, Leon
> Zhang, etc. have also posted some GB18030 papers in Chinese on the
> Internet.
>
> I am not sure how to contact the official Chinese standard committee
> who defined the GB18030 standard, but I am sure some of you may know.
> :-)
>
> I just found a copy of the Big5-HKSCS registration on-line.  I guess we
> can use that as a template, and follow RFC 2278 to write a formal
> application for GBK and GB18030 (in ASCII) and submit it.  BTW, that
> registration is at:
>
>
> http://lists.w3.org/Archives/Public/ietf-charsets/2000OctDec/att-0000/01-
> Submisson_to_IANA.txt
>
>    (Yes, there is a small typo: Submisson instead of Submission.  :-)
>
> Any comments, suggestions and guidance are welcome!  :-)
>
> Best Regards,
>
> Anthony Fok
>
> --
> Anthony Fok Tung-Ling
> ThizLinux Laboratory   <anthony@thizlinux.com> http://www.thizlinux.com/
> Debian Chinese Project <foka@debian.org>
> http://www.debian.org/intl/zh/ Come visit Our Lady of Victory Camp!
> http://www.olvc.ab.ca/
>
>

Received on Monday, 14 January 2002 09:49:57 UTC