W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2009

charset name matching rules

From: Erik van der Poel <erikv@google.com>
Date: Sat, 15 Aug 2009 08:42:17 -0700
Message-ID: <c07a32650908150842j499634abh5cf8e7054925f808@mail.gmail.com>
To: public-html-comments@w3.org
In section 2.7 of HTML 5, it says:

> When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."

I think this should be removed, since none of the major browsers do
this, and it is too lenient.

The general approach should be: As lenient as the major browsers, but
not more lenient. Lenience leads to a proliferation of garbage.

Of course, the question is what to replace the above text with. There
is a discussion on the ietf-charsets@iana.org list about gathering the
current lists of charsets and aliases from the browsers. Hopefully,
that discussion will result in something that can be published in HTML

How about putting a placeholder in the current HTML 5 draft? I
consider UTS22 to be harmful, so it should be removed from HTML 5

Received on Saturday, 15 August 2009 15:42:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:25 UTC