RE: modification to registration of charset ks_c_5601-1987 from Martin Duerst on 2001-06-13 (ietf-charsets@w3.org from April to June 2001)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 13 Jun 2001 16:06:37 +0900
To: (wrong string) 輯(Bn Simonsen <keld@dkuug.dk>, "Eric A. Hall" <ehall@ehsco.com>
Cc: ietf-charsets@iana.org
Message-id: <4.2.0.58.J.20010613153010.0329c220@sh.w3.mag.keio.ac.jp>

Hello Takao, others,

I have just had a look at the situation, and I can only say
that it's a major mess.

I have looked at the CP 949 table in Nadine Kano's book as
well as at
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
(somehow ftp didn't work for me today).

I also have a copy of both KS C 5601-1992 as well as
KS C 5601-1987, and Ken Lunde's CJKV book gives the
table for KS X 1001:1992. Codepoint-wise, all these three
are the same, but there is a huge difference between these
and CP 949, in that CP 949 contains 8822 more Hangul syllables,
stuffed in around the 94x94 block of iso-ir-149 in a manner similar
to GBK. Using KS_C_5601-1987 for CP 949 looks like a serious
mistake.

Takao, can you check whether KS_C_5601-1987 is indeed used
for CP 949, or whether it is only used for the appropriate
subset?

There are three things that need labels:

a) The 94x94 block of iso-ir-149
b) The combination of a) and an ASCII-like 7-bit set
c) The combination of b) and the 8822 more Hangul

The current official situation in the charset registry is:

a)
Name: KS_C_5601-1987                                    [RFC1345,KXS2]
MIBenum: 36
Source: ECMA registry
Alias: iso-ir-149
Alias: KS_C_5601-1989
Alias: KSC_5601
Alias: korean
Alias: csKSC56011987

b)
Name: EUC-KR  (preferred MIME name)                     [RFC1557,Choi]
MIBenum: 38
Source: RFC-1557 (see also KS_C_5861-1992)
Alias: csEUCKR

c)
No label registered

This unfortunately doesn't coincide with actual practice.
Either way, some implementations won't work together,
and some people will get unhappy.

I'm not sure whether we need a lot of labels for a)
(using only Korean, without some 7-bit set, should be
very rare). Also, I have no idea whether things such
as iso-ir-149 (there are a lot of these in the charset
registry) are supposed to be used in GL or in GR;
maybe Keld can enlighten us.

Also, because the additional 8822 Hangul syllables in
CP 949 are used very rarely, most of the pages labeled
as KS_C_5601-1987 by Microsoft applications will at least
conform to b).

So I guess this would lead to more or less the following
plan:

- In the registry, add a notice to KS_C_5601-1989 to say that
   this is misused to mean EUC-KR or CP949 by some applications.
   Potentially deprecate it clearly.
- In the registry, add an entry for CP949 (if Microsoft
   has a need to label it).

- By applications: Use EUC-KR or CP949 depending on what
   the repertoire is.

Any comments?

Regards,   Martin.

At 23:18 01/06/06 -0700, Takao Suzuki wrote:
> > Just one thing that we need to check. My definition
> > of iso-ir-149 did not include the 2022 shift between
> > an 7-bit ASCII-like charset and the 14-bit iso-ir-149.
> > Maybe that is the intention with ks_c_5601 .
> >
> > We need to check with Korean users.
> >
> > Kind regards
> > Keld
>
>Microsoft treats KS_C_5601-1987 the same as Windows CP-949
>whose Unicode mapping table is available at:
>
>ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
>
>It does include 7-bit ASCII characters. And we have a problem
>if a chraset definition of any of these new alias disagrees
>with Windows 949. Microsoft Outlook Express, which I used to
>work on, and Microsoft Outlook send Korean Messages including
>US ASCII characters using "KS_C_5601-1987", and Korean web
>pages including US ASCII characters also use KS_C_5601-1987
>everywhere.
>
>Thanks
>
>-takao

Received on Wednesday, 13 June 2001 07:26:54 UTC