W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2009

Re: charset name matching rules

From: Erik van der Poel <erikv@google.com>
Date: Sat, 15 Aug 2009 16:31:09 -0700
Message-ID: <c07a32650908151631s6083ed38sa65d4aecc0fd3e4e@mail.gmail.com>
To: Ian Hickson <ian@hixie.ch>
Cc: public-html-comments@w3.org
Hi Ian,

I had another look at section 2.7, and it does have a pointer to the
IANA charset registry, which also says "However, no distinction is
made between use of upper and lower case letters." This is the only
matching rule that we need. UTS22 is too lenient, and we all know what
happens to the Web when browsers are too lenient. If the discussion on
ietf-charsets@iana.org actually yields any more results, we may wish
to consider adding them to HTML 5, but for now, I think having HTML 5
refer to the IANA charset registry is sufficient.

Erik

On Sat, Aug 15, 2009 at 2:47 PM, Ian Hickson<ian@hixie.ch> wrote:
> On Sat, 15 Aug 2009, Erik van der Poel wrote:
>> In section 2.7 of HTML 5, it says:
>>
>> > When comparing a string specifying a character encoding with the name
>> > or alias of a character encoding to determine if they are equal, user
>> > agents must use the Charset Alias Matching rules defined in Unicode
>> > Technical Standard #22. [UTS22]
>> >
>> > For instance, "GB_2312-80" and "g.b.2312(80)" are considered
>> > equivalent names."
>>
>> I think this should be removed, since none of the major browsers do
>> this, and it is too lenient.
>>
>> The general approach should be: As lenient as the major browsers, but
>> not more lenient. Lenience leads to a proliferation of garbage.
>>
>> Of course, the question is what to replace the above text with. There is
>> a discussion on the ietf-charsets@iana.org list about gathering the
>> current lists of charsets and aliases from the browsers. Hopefully, that
>> discussion will result in something that can be published in HTML 5.
>>
>> How about putting a placeholder in the current HTML 5 draft? I consider
>> UTS22 to be harmful, so it should be removed from HTML 5 ASAP.
>
> I'm happy to replace that section with other text as soon as we know what
> it should be replaced with, but from where I'm sitting, moving the Web to
> UTS22 is better than moving the spec to being undefined.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>
Received on Saturday, 15 August 2009 23:31:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:00 GMT