- From: Chris Newman <Chris.Newman@Sun.COM>
- Date: Tue, 06 Aug 2002 10:53:52 -0700
- To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
begin quotation by Markus Scherer on 2002/8/6 8:07 -0700: > Chris Newman wrote: >> However, I do object to the following two aliases: >>> ISO_8859-1:1987 IBM819 IBM-819 >>> ANSI_X3.4-1968 IBM367 IBM-367 > > Well, fact is that > a) There are already a number of aliases for these charsets > that implementations have to deal with. Yes, and that's extremely unfortunate and I would have objected had I been engaged in standards when the registry was created, but alas I've only been involved in the IETF for 11 years. > b) There is a large installed base of software for (and of) IBM > operating systems and middleware that use the numeric IDs > 819 and 367 and tend to prepend "IBM-" to all IDs > for interoperation with open-standards systems. And if that software emits the "IBM-#" aliases on the open Internet, it is non-compliant and needs to be fixed. What makes more sense: forcing all standards-compliant software to change to use a new alias, or forcing just the limited set of broken IBM software to use the correct standard names for interoperable charsets? > Although these charsets are among the most important and most widely > used, it seems artificial to limit the use of aliases only for these two. Aliases impede interoperability by creating a cross-product of cases to test interoperability (number of aliases * number of products). I can't make a strong objection to the addition of aliases to limited-use charsets because they're already limited use -- meaning interoperability is neither important nor expected. But for those charsets which are already widely used, the addition of aliases breaks the existing interoperable installed base of standards-compliant software. That is the point where any good engineer should stand up and object. And it's not just _two_ character sets. Here's a partial list of interoperable character sets: MIME MIME charset Text MIBenum Alternate References -------------- ---- ------- --------- ---------- ISO-8859-1 Yes 4 UTF-8 [RFC2046,ISO-8859] ISO-8859-2 Yes 5 UTF-8 [RFC2046,ISO-8859] ISO-8859-3 Yes 6 UTF-8 [RFC2046,ISO-8859] ISO-8859-4 Yes 7 UTF-8 [RFC2046,ISO-8859] ISO-8859-5 Yes 8 KOI8-R,UTF-8 [RFC2046,ISO-8859] ISO-8859-6 Yes 9 UTF-8 [RFC2046,ISO-8859] ISO-8859-7 Yes 10 UTF-8 [RFC1947,2046,ISO-8859] ISO-8859-8 Yes 11 UTF-8 [RFC1555,2046,ISO-8859] ISO-8859-9 Yes 12 UTF-8 [RFC2046,ISO-8859] ISO-8859-10 Yes 13 UTF-8 [RFC2046,ISO-8859] US-ASCII Yes 3 N/A [RFC2046] UTF-8 Yes 106 N/A [RFC2279] ISO-8859-6-E Yes 81 UTF-8 [RFC1556] ISO-8859-6-I Yes 82 UTF-8 [RFC1556] ISO-8859-8-E Yes 84 UTF-8 [RFC1556] ISO-8859-8-I Yes 85 UTF-8 [RFC1556] KOI8-R Yes 2084 ISO-8859-5,UTF-8 [RFC1489] KOI8-U Yes 2088 UTF-8 [RFC2319] ISO-2022-KR Yes 37 EUC-KR,UTF-8 [RFC1557,KS_C_5601-1987] EUC-KR Yes 38 ISO-2022-KR,UTF-8 [RFC1557,KS_C_5601-1987] ISO-2022-JP Yes 39 UTF-8 [RFC1468] ISO-2022-CN Yes 104*A UTF-8 [RFC1922] CN-GB Yes N/A UTF-8 [RFC1922] CN-Big5 Yes N/A UTF-8 [RFC1922] HZ-GB-2312 Yes 2085 UTF-8 [RFC1842,1843] VISCII Yes 2082 UTF-8 [RFC1456] VIQR Yes 2083 VISCII,UTF-8 [RFC1456] GB2312 Yes? 2025 UTF-8 [RFC1922] Big5 Yes? 2026 UTF-8 [RFC1922] EUC-JP Yes 18 ISO-2022-JP,UTF-8 [JIS X0212-1990] Shift_JIS Yes 17 ISO-2022-JP,UTF-8 [JIS X0212-1990] I will object to the addition of aliases to any of these. - Chris
Received on Tuesday, 6 August 2002 13:55:21 UTC