W3C home > Mailing lists > Public > www-international@w3.org > October to December 1997

Re: Serious bug on www.microsoft.com -> is GB_2312-80 widely accepted?

From: Erik van der Poel <erik@netscape.com>
Date: Fri, 21 Nov 1997 11:46:48 -0800
Message-ID: <3475E528.76E5B508@netscape.com>
To: Sam Sun <ssun@CNRI.Reston.Va.US>
CC: www international <www-international@w3.org>, Unicode Discussion <unicode@unicode.org>
Sam Sun wrote:

> Thanks for clearing this up for me! I got confused because I had my
> browser's default encoding set to GB2312.
>
> So "gb_2312-80" is the correct charset encoding, and Front Page did it
> right.

No no no no no. As I explained, "gb2312" is the correct name.

> One more problem though. It seems that Netscape Communicator doesn't
> recognize the "gb_2312-80", but only "gb2312". IE4.0 supports both.

Microsoft doesn't seem to realize that there is a difference between the two.
Netscape supports the correct one. Since "gb2312" works in both companies'
browsers, you should use this name.

> I just created a web page which can be used to test against these tags, and
> it's at: http://ssun.cnri.reston.va.us/gb2312/index.html
>
> So, is GB_2312-80 a widely accepted name?

No.

> I'm new to the list, please someone let me know if the question shouldn't be
> raised here.

Tex and Rick suggested that bug reports be sent directly to the relevant
companies, but I feel that these charset names are confusing to many people, so
I'd rather spread the info as widely as possible.

Erik

> From: Erik van der Poel <erik@netscape.com>
> To: Sam Sun <ssun@CNRI.Reston.Va.US>
> Cc: www international <www-international@w3.org>; Unicode Discussion
> <unicode@unicode.org>
> Date: Friday, November 21, 1997 12:25 PM
> Subject: Re: Serious bug on www.microsoft.com
>
> >The Internet charset registry is at:
> >
> >ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
> >
> >The GB 2312-related entries are as follows:
> >
> >Name: GB_2312-80                                        [RFC1345,KXS2]
> >MIBenum: 57
> >Source: ECMA registry
> >Alias: iso-ir-58
> >Alias: chinese
> >Alias: csISO58GB231280
> >
> >Name: GB2312  (preferred MIME name)
> >MIBenum: 2025
> >Source: Chinese for People's Republic of China (PRC) mixed one byte,
> >        two byte set:
> >          20-7E = one byte ASCII
> >          A1-FE = two byte PRC Kanji
> >        See GB 2312-80
> >        PCL Symbol Set Id: 18C
> >Alias: csGB2312
> >
> >Name: HZ-GB-2312
> >MIBenum: 2085
> >Source: RFC 1842, RFC 1843                              [RFC1842, RFC1843]
> >
> >As you can see, "GB2312" is the name of the charset that also contains
> >single-byte ASCII characters. This is the charset that is used in many
> places,
> >including Web pages. GB_2312-80 has an alias "iso-ir-58", which means that
> it is
> >registration number 58 in ISO's registry, and this character set does not
> >include single-byte ASCII characters, so this is not the charset that is
> used on
> >the Internet. The "HZ-GB-2312" charset is a 7-bit encoding of GB 2312, used
> in
> >some places such as Usenet newsgroups.
> >
> >Summary: "GB2312" is the correct name.
> >
> >(It is case-insensitive, so "gb2312" is also correct.)
> >
> >Erik
> >
> >Sam Sun wrote:
> >
> >> There is a similar bug from Front Page 97, the Microsoft's web authering
> >> tool.
> >>
> >> When used to generate HTML documents using Simplified Chinese Character
> Set
> >> encoding, it uses illegal charset name "gb_2312-80".
> >>
> >> I believe the right charset name should be "gb-2312-80". Note that it's
> not
> >> a underscore
> >> character between "gb" and "2312", but a hyphen character.
Received on Friday, 21 November 1997 14:47:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:48 GMT