Re: Charset "iso-10646-1" from Masayasu Ishikawa on 2001-08-31 (www-html@w3.org from August 2001)

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Sat, 01 Sep 2001 03:42:04 +0900 (JST)
To: www-html@w3.org
Message-Id: <20010901.034204.74730910.mimasa@w3.org>

[ www-html only ]

Terje Bless <link@pobox.com> wrote:

> >I wonder where you came up with iso-10646-1.
> 
> It's a common misconception. Character Encoding issues are _hard_ and most
> people don't understand them.

That may be true.  For example,

> Since the ISO-8859-* series has been well
> worked into the collective subsconscious, if a spec uses a similar looking
> string (such as "ISO-10646") anywhere in relation to charset issues, a lot
> of people will immediately assume it is a charset name in the same vein as
> the ISO-8859-* encodings.

That assumption happens to be correct, as charset name "ISO-10646"
does exist as an alias of "ISO-10646-Unicode-Latin1", as opposed to
"ISO-10646-1".

ISO-8859-* encodings are also not easy to understand, for example,
"latin1" is an alias of "ISO_8859-1:1987" whose preferred MIME name
is "ISO-8859-1" (and the latest version of ISO/IEC 8859-1 is 
ISO/IEC 8859-1:1998, not ISO/IEC 8859-1:1987), but "latin8" is NOT
an alias of "ISO-8859-8" but an alias of "ISO-8859-14", yet ISO/IEC
8859-16:2001, a.k.a. Latin alphabet No. 10, is "ISO-8859-16" but has
no alias like "latin10".  One might expect that a charset name
"ISO-8859-11" already exists, but actually it doesn't exist (yet).

> This has cropped up periodically and should
> probably be mentioned to the HTML WG; a small explanatory note,
> strategically placed, could avoid a lot of confusion.

I would say confusion about "iso-10646-1" is only the tip of
an iceberg and I don't think a "small" note can avoid "a lot of"
confusion.

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Friday, 31 August 2001 14:42:17 UTC