Re: revised suggested registered names for character sets

Roy T. Fielding wrote:
> 
>       "US-ASCII"
>       "ISO-8859-1"     "ISO-8859-2"     "ISO-8859-3"
>       "ISO-8859-4"     "ISO-8859-5"     "ISO-8859-6"
>       "ISO-8859-7"     "ISO-8859-8"     "ISO-8859-9"
>       "ISO-2022-JP"    "ISO-2022-JP-2"  "ISO-2022-KR"
>       "GB2312"         "BIG5"           "KOI8-R"
>       "SHIFT_JIS"
>       "EUC-KR"   (for "EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_KOREAN")

No, there is no such thing as
"EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_KOREAN". See:

  ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets


>       "EUC-JP"   (for EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE)
>       "UCS-4"    (for ISO-10646)

No, "ISO-10646" is an alias for "ISO-10646-Unicode-Latin1", which is a
subset. It should be "ISO-10646-UCS-4".


>       "UCS-2"    (for UNICODE-1-1)

This one may deserve some discussion. There is such a thing as
"UNICODE-1-1" in the registry, but there is also "ISO-10646-UCS-2". It
is not clear to me what the difference is.


>       "UTF-7"    (for UNICODE-1-1-UTF-7)
>       "UTF-8"    (for UNICODE-1-1-UTF-8)

No, there is no such thing as "UNICODE-1-1-UTF-8" in the registry.


On second thought, these UCS and UTF charsets may cause confusion, since
it may not be clear to some people whether these refer to the "first"
version of 10646 or the newer version, with the amendments attached
(including the major Korean block change).

One nice thing about the name UNICODE-1-1 is that it unambiguously
refers to version 1.1 of Unicode. However, the nice thing about UCS-2 is
that it's a short name.


> [I don't know the descriptive names for GB2312, BIG5 and KOI8-R.]

There are no "descriptive" names for these. These are the names
registered with IANA.


Erik

Received on Thursday, 6 June 1996 08:55:01 UTC