- From: Erik van der Poel <erik@netscape.com>
- Date: Fri, 21 Nov 1997 11:46:48 -0800
- To: Sam Sun <ssun@CNRI.Reston.Va.US>
- CC: www international <www-international@w3.org>, Unicode Discussion <unicode@unicode.org>
Sam Sun wrote: > Thanks for clearing this up for me! I got confused because I had my > browser's default encoding set to GB2312. > > So "gb_2312-80" is the correct charset encoding, and Front Page did it > right. No no no no no. As I explained, "gb2312" is the correct name. > One more problem though. It seems that Netscape Communicator doesn't > recognize the "gb_2312-80", but only "gb2312". IE4.0 supports both. Microsoft doesn't seem to realize that there is a difference between the two. Netscape supports the correct one. Since "gb2312" works in both companies' browsers, you should use this name. > I just created a web page which can be used to test against these tags, and > it's at: http://ssun.cnri.reston.va.us/gb2312/index.html > > So, is GB_2312-80 a widely accepted name? No. > I'm new to the list, please someone let me know if the question shouldn't be > raised here. Tex and Rick suggested that bug reports be sent directly to the relevant companies, but I feel that these charset names are confusing to many people, so I'd rather spread the info as widely as possible. Erik > From: Erik van der Poel <erik@netscape.com> > To: Sam Sun <ssun@CNRI.Reston.Va.US> > Cc: www international <www-international@w3.org>; Unicode Discussion > <unicode@unicode.org> > Date: Friday, November 21, 1997 12:25 PM > Subject: Re: Serious bug on www.microsoft.com > > >The Internet charset registry is at: > > > >ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets > > > >The GB 2312-related entries are as follows: > > > >Name: GB_2312-80 [RFC1345,KXS2] > >MIBenum: 57 > >Source: ECMA registry > >Alias: iso-ir-58 > >Alias: chinese > >Alias: csISO58GB231280 > > > >Name: GB2312 (preferred MIME name) > >MIBenum: 2025 > >Source: Chinese for People's Republic of China (PRC) mixed one byte, > > two byte set: > > 20-7E = one byte ASCII > > A1-FE = two byte PRC Kanji > > See GB 2312-80 > > PCL Symbol Set Id: 18C > >Alias: csGB2312 > > > >Name: HZ-GB-2312 > >MIBenum: 2085 > >Source: RFC 1842, RFC 1843 [RFC1842, RFC1843] > > > >As you can see, "GB2312" is the name of the charset that also contains > >single-byte ASCII characters. This is the charset that is used in many > places, > >including Web pages. GB_2312-80 has an alias "iso-ir-58", which means that > it is > >registration number 58 in ISO's registry, and this character set does not > >include single-byte ASCII characters, so this is not the charset that is > used on > >the Internet. The "HZ-GB-2312" charset is a 7-bit encoding of GB 2312, used > in > >some places such as Usenet newsgroups. > > > >Summary: "GB2312" is the correct name. > > > >(It is case-insensitive, so "gb2312" is also correct.) > > > >Erik > > > >Sam Sun wrote: > > > >> There is a similar bug from Front Page 97, the Microsoft's web authering > >> tool. > >> > >> When used to generate HTML documents using Simplified Chinese Character > Set > >> encoding, it uses illegal charset name "gb_2312-80". > >> > >> I believe the right charset name should be "gb-2312-80". Note that it's > not > >> a underscore > >> character between "gb" and "2312", but a hyphen character.
Received on Friday, 21 November 1997 14:47:20 UTC