- From: Erik van der Poel <erikv@google.com>
- Date: Sat, 15 Aug 2009 08:42:17 -0700
- To: public-html-comments@w3.org
In section 2.7 of HTML 5, it says: > When comparing a string specifying a character encoding with the name > or alias of a character encoding to determine if they are equal, user > agents must use the Charset Alias Matching rules defined in Unicode > Technical Standard #22. [UTS22] > > For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names." I think this should be removed, since none of the major browsers do this, and it is too lenient. The general approach should be: As lenient as the major browsers, but not more lenient. Lenience leads to a proliferation of garbage. Of course, the question is what to replace the above text with. There is a discussion on the ietf-charsets@iana.org list about gathering the current lists of charsets and aliases from the browsers. Hopefully, that discussion will result in something that can be published in HTML 5. How about putting a placeholder in the current HTML 5 draft? I consider UTS22 to be harmful, so it should be removed from HTML 5 ASAP. Erik
Received on Saturday, 15 August 2009 15:42:56 UTC