W3C home > Mailing lists > Public > ietf-charsets@w3.org > April to June 2001

RE: modification to registration of charset ks_c_5601-1987

From: Martin Duerst <duerst@w3.org>
Date: Wed, 13 Jun 2001 16:06:37 +0900
To: (wrong string) Sn Simonsen <keld@dkuug.dk>, "Eric A. Hall" <ehall@ehsco.com>
Cc: ietf-charsets@iana.org
Message-id: <>
Hello Takao, others,

I have just had a look at the situation, and I can only say
that it's a major mess.

I have looked at the CP 949 table in Nadine Kano's book as
well as at
(somehow ftp didn't work for me today).

I also have a copy of both KS C 5601-1992 as well as
KS C 5601-1987, and Ken Lunde's CJKV book gives the
table for KS X 1001:1992. Codepoint-wise, all these three
are the same, but there is a huge difference between these
and CP 949, in that CP 949 contains 8822 more Hangul syllables,
stuffed in around the 94x94 block of iso-ir-149 in a manner similar
to GBK. Using KS_C_5601-1987 for CP 949 looks like a serious

Takao, can you check whether KS_C_5601-1987 is indeed used
for CP 949, or whether it is only used for the appropriate

There are three things that need labels:

a) The 94x94 block of iso-ir-149
b) The combination of a) and an ASCII-like 7-bit set
c) The combination of b) and the 8822 more Hangul

The current official situation in the charset registry is:

Name: KS_C_5601-1987                                    [RFC1345,KXS2]
MIBenum: 36
Source: ECMA registry
Alias: iso-ir-149
Alias: KS_C_5601-1989
Alias: KSC_5601
Alias: korean
Alias: csKSC56011987

Name: EUC-KR  (preferred MIME name)                     [RFC1557,Choi]
MIBenum: 38
Source: RFC-1557 (see also KS_C_5861-1992)
Alias: csEUCKR

No label registered

This unfortunately doesn't coincide with actual practice.
Either way, some implementations won't work together,
and some people will get unhappy.

I'm not sure whether we need a lot of labels for a)
(using only Korean, without some 7-bit set, should be
very rare). Also, I have no idea whether things such
as iso-ir-149 (there are a lot of these in the charset
registry) are supposed to be used in GL or in GR;
maybe Keld can enlighten us.

Also, because the additional 8822 Hangul syllables in
CP 949 are used very rarely, most of the pages labeled
as KS_C_5601-1987 by Microsoft applications will at least
conform to b).

So I guess this would lead to more or less the following

- In the registry, add a notice to KS_C_5601-1989 to say that
   this is misused to mean EUC-KR or CP949 by some applications.
   Potentially deprecate it clearly.
- In the registry, add an entry for CP949 (if Microsoft
   has a need to label it).

- By applications: Use EUC-KR or CP949 depending on what
   the repertoire is.

Any comments?

Regards,   Martin.

At 23:18 01/06/06 -0700, Takao Suzuki wrote:
> > Just one thing that we need to check. My definition
> > of iso-ir-149 did not include the 2022 shift between
> > an 7-bit ASCII-like charset and the 14-bit iso-ir-149.
> > Maybe that is the intention with ks_c_5601 .
> >
> > We need to check with Korean users.
> >
> > Kind regards
> > Keld
>Microsoft treats KS_C_5601-1987 the same as Windows CP-949
>whose Unicode mapping table is available at:
>It does include 7-bit ASCII characters. And we have a problem
>if a chraset definition of any of these new alias disagrees
>with Windows 949. Microsoft Outlook Express, which I used to
>work on, and Microsoft Outlook send Korean Messages including
>US ASCII characters using "KS_C_5601-1987", and Korean web
>pages including US ASCII characters also use KS_C_5601-1987
Received on Wednesday, 13 June 2001 07:26:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:17 UTC