- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 26 Aug 2009 03:56:34 +0000 (UTC)
- To: Erik van der Poel <erikv@google.com>, Anne van Kesteren <annevk@opera.com>, Geoffrey Sneddon <gsneddon@opera.com>
- Cc: public-html-comments@w3.org
On Sat, 15 Aug 2009, Erik van der Poel wrote: > > [UTS22] > > Clearly, they recommend that you ignore not only the underscore, but > many other characters too. This is so different from current browser > behavior that I am surprised that it is even being considered. The alternative is registering the many aliases that are needed. > The ietf-charsets group is currently talking about gathering the > browsers' lists of charsets, aliases and supersets (e.g. windows-1252 is > the superset used instead of iso-8859-1). I believe we will bump into > several differences between the browsers, but I also believe that the > differences become less and less interesting as you go down the list of > popular charsets. So my suggestion is that we initially focus on > commonly used encodings. Then we can add more info to the HTML 5 spec > (or a spin-off spec, if appropriate) over time. Ok, I've changed the rule to just be ASCII case-insensitive with leading and trailing whitespace trimmming. I'm assuming that the needed aliases will be registered. If they're not, this will be a problem for implementors trying to follow the spec. On Mon, 17 Aug 2009, Geoffrey Sneddon wrote: > > Going by the case-insensitive matching rule is incompatible with web > content, as there is plenty of content out there which expects some > normalization to be done. I originally suggested using the UTS22 rules > as it seemed better than the status quo of three normalization rules > (the case-insensitive one; what browsers currently do, which HTML 5 > previously defined; and UTS22) by reducing this to only two > normalization rules (purely case-insensitivity, as mentioned above, is > incompatible with the web so that's not an option, and as it turns out > UTS22 is incompatible as well). I guess we should go back to the > normalization rules that HTML 5 previously defined. Those rules had the same underscore problem that Anne says is definitely wrong, though. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 26 August 2009 03:55:43 UTC