- From: Misha Wolf <misha.wolf@reuters.com>
- Date: Tue, 18 Feb 1997 14:51:04 +0000 (GMT)
- To: www-international <www-international@w3.org>, Unicode <unicode@unicode.org>, Seong-Woong Kim <kswoong@pljuno.sogang.ac.kr>, Junwon Chung <greatjun@hen.nca.or.kr>
Erik van der Poel wrote (to the Unicode list): >Hi Misha, > >First, thanks for the great Unicode conference HTML pages. > > >> Because of browser incompatibility in the recognition of charset names, we >> have not included a <meta ... charset ...> tag in the Korean page. As far >> as we can see, the page displays correctly in a number of browsers, but each >> understands a different charset tag. Netscape Navigator/Communicator >> understands "EUC-KR", Microsoft Internet Explorer understands "KS_C_5601- >> 1987" and Alis Tango understands "KSC5601". > >The charset registry is at: > > ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets > >The Korean entries are as follows: > > Name: KS_C_5601-1987 [RFC1345,KXS2] > MIBenum: 36 > Source: ECMA registry > Alias: iso-ir-149 > Alias: KS_C_5601-1989 > Alias: KSC_5601 > Alias: korean > Alias: csKSC56011987 > > Name: ISO-2022-KR (preferred MIME name) [RFC1557,Choi] > MIBenum: 37 > Source: RFC-1557 (see also KS_C_5601-1987) > Alias: csISO2022KR > > Name: EUC-KR (preferred MIME name) [RFC1557,Choi] > MIBenum: 38 > Source: RFC-1557 (see also KS_C_5861-1992) > Alias: csEUCKR > >There isn't even an entry called "KSC5601", so Alis' Tango would seem to >be incorrect. > >There is an entry called "KS_C_5601-1987" but it is registered as number >149 in the ISO (ECMA) registry. (See the alias "iso-ir-149".) This means >that it is a single character set in the ISO 2022 sense, which means >that it only contains the Korean 2-byte characters. Inspection of any >Korean HTML page clearly shows that 2-byte Korean characters are mixed >with single-byte ASCII characters, so it is clear that this is not >KS_C_5601-1987. Therefore, MSIE is also incorrect. > >If you read RFC 1557, you will see that "EUC-KR" is the encoding where >2-byte Korean characters have the 8th bit up on both bytes. EUC stands >for Extended Unix Code. It is the encoding scheme where ASCII characters >are encoded as single-byte characters with the 8th bit down, and other >characters are encoded with the 8th bit up. Netscape's Korean charset >(EUC-KR) is therefore correct. > >I request that you update your Unicode conference page(s) to indicate >the correct, official Korean charset name (EUC-KR). > > >Thanks, > >Erik I have reviewed the relevant documents and have discussed this with others. The consensus seems to be that EUC-KR is the right charset tag to use. If you disagree, please tell me. The updated preview pages are at <http://194.75.134.50/unicode/iuc10>. In the next few days, the pages at <http://www.reuters.com/unicode/iuc10> and <http://www.unicode.org> will be updated to reflect the preview pages. Misha
Received on Tuesday, 18 February 1997 11:06:39 UTC