W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2009

charset name matching rules

From: Erik van der Poel <erikv@google.com>
Date: Sat, 15 Aug 2009 08:42:17 -0700
Message-ID: <c07a32650908150842j499634abh5cf8e7054925f808@mail.gmail.com>
To: public-html-comments@w3.org
In section 2.7 of HTML 5, it says:

> When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
>
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."

I think this should be removed, since none of the major browsers do
this, and it is too lenient.

The general approach should be: As lenient as the major browsers, but
not more lenient. Lenience leads to a proliferation of garbage.

Of course, the question is what to replace the above text with. There
is a discussion on the ietf-charsets@iana.org list about gathering the
current lists of charsets and aliases from the browsers. Hopefully,
that discussion will result in something that can be published in HTML
5.

How about putting a placeholder in the current HTML 5 draft? I
consider UTS22 to be harmful, so it should be removed from HTML 5
ASAP.

Erik
Received on Saturday, 15 August 2009 15:42:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:00 GMT