- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 13 Jun 2001 16:06:37 +0900
- To: (wrong string) S(Bn Simonsen <keld@dkuug.dk>, "Eric A. Hall" <ehall@ehsco.com>
- Cc: ietf-charsets@iana.org
Hello Takao, others, I have just had a look at the situation, and I can only say that it's a major mess. I have looked at the CP 949 table in Nadine Kano's book as well as at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT (somehow ftp didn't work for me today). I also have a copy of both KS C 5601-1992 as well as KS C 5601-1987, and Ken Lunde's CJKV book gives the table for KS X 1001:1992. Codepoint-wise, all these three are the same, but there is a huge difference between these and CP 949, in that CP 949 contains 8822 more Hangul syllables, stuffed in around the 94x94 block of iso-ir-149 in a manner similar to GBK. Using KS_C_5601-1987 for CP 949 looks like a serious mistake. Takao, can you check whether KS_C_5601-1987 is indeed used for CP 949, or whether it is only used for the appropriate subset? There are three things that need labels: a) The 94x94 block of iso-ir-149 b) The combination of a) and an ASCII-like 7-bit set c) The combination of b) and the 8822 more Hangul The current official situation in the charset registry is: a) Name: KS_C_5601-1987 [RFC1345,KXS2] MIBenum: 36 Source: ECMA registry Alias: iso-ir-149 Alias: KS_C_5601-1989 Alias: KSC_5601 Alias: korean Alias: csKSC56011987 b) Name: EUC-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 38 Source: RFC-1557 (see also KS_C_5861-1992) Alias: csEUCKR c) No label registered This unfortunately doesn't coincide with actual practice. Either way, some implementations won't work together, and some people will get unhappy. I'm not sure whether we need a lot of labels for a) (using only Korean, without some 7-bit set, should be very rare). Also, I have no idea whether things such as iso-ir-149 (there are a lot of these in the charset registry) are supposed to be used in GL or in GR; maybe Keld can enlighten us. Also, because the additional 8822 Hangul syllables in CP 949 are used very rarely, most of the pages labeled as KS_C_5601-1987 by Microsoft applications will at least conform to b). So I guess this would lead to more or less the following plan: - In the registry, add a notice to KS_C_5601-1989 to say that this is misused to mean EUC-KR or CP949 by some applications. Potentially deprecate it clearly. - In the registry, add an entry for CP949 (if Microsoft has a need to label it). - By applications: Use EUC-KR or CP949 depending on what the repertoire is. Any comments? Regards, Martin. At 23:18 01/06/06 -0700, Takao Suzuki wrote: > > Just one thing that we need to check. My definition > > of iso-ir-149 did not include the 2022 shift between > > an 7-bit ASCII-like charset and the 14-bit iso-ir-149. > > Maybe that is the intention with ks_c_5601 . > > > > We need to check with Korean users. > > > > Kind regards > > Keld > >Microsoft treats KS_C_5601-1987 the same as Windows CP-949 >whose Unicode mapping table is available at: > >ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT > >It does include 7-bit ASCII characters. And we have a problem >if a chraset definition of any of these new alias disagrees >with Windows 949. Microsoft Outlook Express, which I used to >work on, and Microsoft Outlook send Korean Messages including >US ASCII characters using "KS_C_5601-1987", and Korean web >pages including US ASCII characters also use KS_C_5601-1987 >everywhere. > >Thanks > >-takao
Received on Wednesday, 13 June 2001 07:26:54 UTC