Charset tag for Korean pages [Was: Re: 29 languages + Unicode] from Misha Wolf on 1997-02-18 (www-international@w3.org from January to March 1997)

From: Misha Wolf <misha.wolf@reuters.com>
Date: Tue, 18 Feb 1997 14:51:04 +0000 (GMT)
To: www-international <www-international@w3.org>, Unicode <unicode@unicode.org>, Seong-Woong Kim <kswoong@pljuno.sogang.ac.kr>, Junwon Chung <greatjun@hen.nca.or.kr>
Message-Id: <3404511418021997/A73836/REDMS2/11B293B30300*@MHS>

Erik van der Poel wrote (to the Unicode list):

>Hi Misha,
>
>First, thanks for the great Unicode conference HTML pages.
>
>
>> Because of browser incompatibility in the recognition of charset names, we
>> have not included a <meta ... charset ...> tag in the Korean page.  As far
>> as we can see, the page displays correctly in a number of browsers, but each
>> understands a different charset tag.  Netscape Navigator/Communicator
>> understands "EUC-KR", Microsoft Internet Explorer understands "KS_C_5601-
>> 1987" and Alis Tango understands "KSC5601".
>
>The charset registry is at:
>
>  ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
>
>The Korean entries are as follows:
>
>  Name: KS_C_5601-1987                        [RFC1345,KXS2]
>  MIBenum: 36
>  Source: ECMA registry
>  Alias: iso-ir-149
>  Alias: KS_C_5601-1989
>  Alias: KSC_5601
>  Alias: korean
>  Alias: csKSC56011987
>
>  Name: ISO-2022-KR  (preferred MIME name)    [RFC1557,Choi]
>  MIBenum: 37
>  Source: RFC-1557 (see also KS_C_5601-1987)
>  Alias: csISO2022KR
>
>  Name: EUC-KR  (preferred MIME name)         [RFC1557,Choi]
>  MIBenum: 38
>  Source: RFC-1557 (see also KS_C_5861-1992)
>  Alias: csEUCKR
>
>There isn't even an entry called "KSC5601", so Alis' Tango would seem to
>be incorrect.
>
>There is an entry called "KS_C_5601-1987" but it is registered as number
>149 in the ISO (ECMA) registry. (See the alias "iso-ir-149".) This means
>that it is a single character set in the ISO 2022 sense, which means
>that it only contains the Korean 2-byte characters. Inspection of any
>Korean HTML page clearly shows that 2-byte Korean characters are mixed
>with single-byte ASCII characters, so it is clear that this is not
>KS_C_5601-1987. Therefore, MSIE is also incorrect.
>
>If you read RFC 1557, you will see that "EUC-KR" is the encoding where
>2-byte Korean characters have the 8th bit up on both bytes. EUC stands
>for Extended Unix Code. It is the encoding scheme where ASCII characters
>are encoded as single-byte characters with the 8th bit down, and other
>characters are encoded with the 8th bit up. Netscape's Korean charset
>(EUC-KR) is therefore correct.
>
>I request that you update your Unicode conference page(s) to indicate
>the correct, official Korean charset name (EUC-KR).
>
>
>Thanks,
>
>Erik

I have reviewed the relevant documents and have discussed this with 
others.  The consensus seems to be that EUC-KR is the right charset 
tag to use.  If you disagree, please tell me.

The updated preview pages are at <http://194.75.134.50/unicode/iuc10>.
In the next few days, the pages at <http://www.reuters.com/unicode/iuc10>
and <http://www.unicode.org> will be updated to reflect the preview 
pages.

Misha

Received on Tuesday, 18 February 1997 11:06:39 UTC