- From: Erik van der Poel <erik@netscape.com>
- Date: Fri, 21 Nov 1997 09:23:36 -0800
- To: Sam Sun <ssun@CNRI.Reston.Va.US>
- CC: www international <www-international@w3.org>, Unicode Discussion <unicode@unicode.org>
The Internet charset registry is at: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets The GB 2312-related entries are as follows: Name: GB_2312-80 [RFC1345,KXS2] MIBenum: 57 Source: ECMA registry Alias: iso-ir-58 Alias: chinese Alias: csISO58GB231280 Name: GB2312 (preferred MIME name) MIBenum: 2025 Source: Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C Alias: csGB2312 Name: HZ-GB-2312 MIBenum: 2085 Source: RFC 1842, RFC 1843 [RFC1842, RFC1843] As you can see, "GB2312" is the name of the charset that also contains single-byte ASCII characters. This is the charset that is used in many places, including Web pages. GB_2312-80 has an alias "iso-ir-58", which means that it is registration number 58 in ISO's registry, and this character set does not include single-byte ASCII characters, so this is not the charset that is used on the Internet. The "HZ-GB-2312" charset is a 7-bit encoding of GB 2312, used in some places such as Usenet newsgroups. Summary: "GB2312" is the correct name. (It is case-insensitive, so "gb2312" is also correct.) Erik Sam Sun wrote: > There is a similar bug from Front Page 97, the Microsoft's web authering > tool. > > When used to generate HTML documents using Simplified Chinese Character Set > encoding, it uses illegal charset name "gb_2312-80". > > I believe the right charset name should be "gb-2312-80". Note that it's not > a underscore > character between "gb" and "2312", but a hyphen character.
Received on Friday, 21 November 1997 12:28:09 UTC