Re: Unregistered charset values in HTTP 1.1, the ISO-8859-* values

Olle Jarnefors wrote:
> 
> b) It's unclear what charset registration the preferred
>    MIME name "GB2312" shall designate: the ISO-registered
>    character set GB_2312-80 (MIBenum: 57) or the only
>    incompletely described GB2312 (MIBenum: 2025), which
>    of course already has the proposed preferred MIME name
>    as it's principal name. It is also possible that these
>    two registrations actually refer to exactly the same
>    character set, and should be merged in the IANA registry.

No. While I agree that the name "gb2312" is lousy ("euc-cn" would have
been better), the descriptions provided with the IANA registrations for
gb2312 and gb_2312-80 are:

Name: GB_2312-80                                        [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Alias: chinese
Alias: csISO58GB231280

Name: GB2312
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte, 
        two byte set: 
          20-7E = one byte ASCII 
          A1-FE = two byte PRC Kanji 
        See GB 2312-80 
        PCL Symbol Set Id: 18C
Alias: csGB2312

Since the GB_2312-80 entry mentions iso-ir-58, it is clear that this is
a single character set in the ISO 2022 sense. I.e. this charset does not
include the single-byte ASCII characters.

The gb2312 entry explicitly mentions single-byte ASCII, so it is clear
that this is referring to the charset actually used on the net, whose
name might more properly be "euc-cn" (to follow euc-kr's and euc-jp's
lead).

By the way, there is an RFC (1922) that proposes even more names for
some of the same things:

  ftp://ds.internic.net/rfc/rfc1922.txt

This document mentions "cn-gb" and "cn-big5", which appear to be the
same as "gb2312" and "big5" respectively, but they have not been added
to the IANA registry yet.

Anarchy and chaos rule.


Erik

Received on Friday, 12 July 1996 11:12:45 UTC