Re: Serious bug on www.microsoft.com

The Internet charset registry is at:

ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

The GB 2312-related entries are as follows:

Name: GB_2312-80                                        [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Alias: chinese
Alias: csISO58GB231280

Name: GB2312  (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
        two byte set:
          20-7E = one byte ASCII
          A1-FE = two byte PRC Kanji
        See GB 2312-80
        PCL Symbol Set Id: 18C
Alias: csGB2312

Name: HZ-GB-2312
MIBenum: 2085
Source: RFC 1842, RFC 1843                              [RFC1842, RFC1843]

As you can see, "GB2312" is the name of the charset that also contains
single-byte ASCII characters. This is the charset that is used in many places,
including Web pages. GB_2312-80 has an alias "iso-ir-58", which means that it is
registration number 58 in ISO's registry, and this character set does not
include single-byte ASCII characters, so this is not the charset that is used on
the Internet. The "HZ-GB-2312" charset is a 7-bit encoding of GB 2312, used in
some places such as Usenet newsgroups.

Summary: "GB2312" is the correct name.

(It is case-insensitive, so "gb2312" is also correct.)

Erik

Sam Sun wrote:

> There is a similar bug from Front Page 97, the Microsoft's web authering
> tool.
>
> When used to generate HTML documents using Simplified Chinese Character Set
> encoding, it uses illegal charset name "gb_2312-80".
>
> I believe the right charset name should be "gb-2312-80". Note that it's not
> a underscore
> character between "gb" and "2312", but a hyphen character.

Received on Friday, 21 November 1997 12:28:09 UTC