MIBenum values for Unicode? from Markus Scherer on 2000-08-11 (ietf-charsets@w3.org from July to September 2000)

From: Markus Scherer <markus.scherer@jtcsv.com>
Date: Thu, 10 Aug 2000 17:46:06 -0700
To: charsets <ietf-charsets@iana.org>
Message-id: <39934CCE.1E3E57E6@jtcsv.com>

Hi,

I also have a general question about the MIBenum values:
The numbers 1000..1010 are supposed to be reserved for Unicode and ISO 10646. The numbers 1000..1009 are already used, therefore I have suggested to assign 1010 for SCSU.

My question is this:

Numbers 1002..1009 are assigned for charsets that do not have much to do with Unicode except that their repertoires are subsets of Unicode's.
This is somewhat understandable for US-ASCII and ISO-8859-1 because their encodings use the same numeric values as the first 128 or 256 Unicode characters.

I do not understand why ISO-8859-1 appears to be duplicated as ISO-10646-Unicode-Latin1 (which has a different MIBenum).

I do not understand why the charsets with MIBenums 1005..1009 have Unicode numbers at all. Are they just subsets, i.e., collections of Unicode code points? If so, why would they be registered as charsets?? (They are not IBM codepage numbers.)

An abbreviated list of charsets in this range:

1000 ISO-10646-UCS-2 ("the 2-octet Basic Multilingual Plane, aka Unicode")
1001 ISO-10646-UCS-4 ("the full code space")
1002 ISO-10646-UCS-Basic ("ASCII subset of Unicode")
1003 ISO-10646-Unicode-Latin1 ("ISO Latin-1 subset of Unicode")
1004 ISO-8859-1 ("IBM Latin-1 SAA Core Coded Character Set")
1005 ISO-Unicode-IBM-1261 ("IBM Latin-2, -3, -5, Extended Presentation Set")
1006 ISO-Unicode-IBM-1268 ("IBM Latin-4 Extended Presentation Set")
1007 ISO-Unicode-IBM-1276 ("IBM Cyrillic Greek Extended Presentation Set")
1008 ISO-Unicode-IBM-1264 ("IBM Arabic Presentation Set")
1009 ISO-Unicode-IBM-1265 ("IBM Hebrew Presentation Set")
1010 not used as of today - suggested for SCSU

??

markus

Received on Thursday, 10 August 2000 20:51:36 UTC