- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Sat, 01 Sep 2001 03:42:04 +0900 (JST)
- To: www-html@w3.org
[ www-html only ] Terje Bless <link@pobox.com> wrote: > >I wonder where you came up with iso-10646-1. > > It's a common misconception. Character Encoding issues are _hard_ and most > people don't understand them. That may be true. For example, > Since the ISO-8859-* series has been well > worked into the collective subsconscious, if a spec uses a similar looking > string (such as "ISO-10646") anywhere in relation to charset issues, a lot > of people will immediately assume it is a charset name in the same vein as > the ISO-8859-* encodings. That assumption happens to be correct, as charset name "ISO-10646" does exist as an alias of "ISO-10646-Unicode-Latin1", as opposed to "ISO-10646-1". ISO-8859-* encodings are also not easy to understand, for example, "latin1" is an alias of "ISO_8859-1:1987" whose preferred MIME name is "ISO-8859-1" (and the latest version of ISO/IEC 8859-1 is ISO/IEC 8859-1:1998, not ISO/IEC 8859-1:1987), but "latin8" is NOT an alias of "ISO-8859-8" but an alias of "ISO-8859-14", yet ISO/IEC 8859-16:2001, a.k.a. Latin alphabet No. 10, is "ISO-8859-16" but has no alias like "latin10". One might expect that a charset name "ISO-8859-11" already exists, but actually it doesn't exist (yet). > This has cropped up periodically and should > probably be mentioned to the HTML WG; a small explanatory note, > strategically placed, could avoid a lot of confusion. I would say confusion about "iso-10646-1" is only the tip of an iceberg and I don't think a "small" note can avoid "a lot of" confusion. Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Friday, 31 August 2001 14:42:17 UTC