Re: charset name matching rules

On Sat, 15 Aug 2009, Erik van der Poel wrote:
> In section 2.7 of HTML 5, it says:
> 
> > When comparing a string specifying a character encoding with the name 
> > or alias of a character encoding to determine if they are equal, user 
> > agents must use the Charset Alias Matching rules defined in Unicode 
> > Technical Standard #22. [UTS22]
> >
> > For instance, "GB_2312-80" and "g.b.2312(80)" are considered 
> > equivalent names."
> 
> I think this should be removed, since none of the major browsers do 
> this, and it is too lenient.
> 
> The general approach should be: As lenient as the major browsers, but 
> not more lenient. Lenience leads to a proliferation of garbage.
> 
> Of course, the question is what to replace the above text with. There is 
> a discussion on the ietf-charsets@iana.org list about gathering the 
> current lists of charsets and aliases from the browsers. Hopefully, that 
> discussion will result in something that can be published in HTML 5.
> 
> How about putting a placeholder in the current HTML 5 draft? I consider 
> UTS22 to be harmful, so it should be removed from HTML 5 ASAP.

I'm happy to replace that section with other text as soon as we know what 
it should be replaced with, but from where I'm sitting, moving the Web to 
UTS22 is better than moving the spec to being undefined.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Saturday, 15 August 2009 21:48:29 UTC