W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2009

Re: charset name matching rules

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 26 Aug 2009 03:56:34 +0000 (UTC)
To: Erik van der Poel <erikv@google.com>, Anne van Kesteren <annevk@opera.com>, Geoffrey Sneddon <gsneddon@opera.com>
Cc: public-html-comments@w3.org
Message-ID: <Pine.LNX.4.62.0908260350110.13789@hixie.dreamhostps.com>
On Sat, 15 Aug 2009, Erik van der Poel wrote:
> [UTS22]
> Clearly, they recommend that you ignore not only the underscore, but 
> many other characters too. This is so different from current browser 
> behavior that I am surprised that it is even being considered.

The alternative is registering the many aliases that are needed.

> The ietf-charsets group is currently talking about gathering the 
> browsers' lists of charsets, aliases and supersets (e.g. windows-1252 is 
> the superset used instead of iso-8859-1). I believe we will bump into 
> several differences between the browsers, but I also believe that the 
> differences become less and less interesting as you go down the list of 
> popular charsets. So my suggestion is that we initially focus on 
> commonly used encodings. Then we can add more info to the HTML 5 spec 
> (or a spin-off spec, if appropriate) over time.

Ok, I've changed the rule to just be ASCII case-insensitive with leading 
and trailing whitespace trimmming.

I'm assuming that the needed aliases will be registered. If they're not, 
this will be a problem for implementors trying to follow the spec.

On Mon, 17 Aug 2009, Geoffrey Sneddon wrote:
> Going by the case-insensitive matching rule is incompatible with web 
> content, as there is plenty of content out there which expects some 
> normalization to be done. I originally suggested using the UTS22 rules 
> as it seemed better than the status quo of three normalization rules 
> (the case-insensitive one; what browsers currently do, which HTML 5 
> previously defined; and UTS22) by reducing this to only two 
> normalization rules (purely case-insensitivity, as mentioned above, is 
> incompatible with the web so that's not an option, and as it turns out 
> UTS22 is incompatible as well). I guess we should go back to the 
> normalization rules that HTML 5 previously defined.

Those rules had the same underscore problem that Anne says is definitely 
wrong, though.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 26 August 2009 03:55:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:03:57 UTC