- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 05 Aug 2009 14:25:20 +0200
On Wed, 05 Aug 2009 02:01:59 +0200, Ian Hickson <ian at hixie.ch> wrote: > I'm pretty sure that character encoding support in browsers is more of a > "collect them all" kind of thing than really based on content that > requires it, to be honest. Really? I think a lot of them are actually used. If you know anything I'd love to trim the amount of encodings the Web needs to a smaller list than what we currently ship with. Ideally this becomes a fixed list across all Web languages. > If someone can provide a firm list of encodings that they are confident > are required for a certain substantial percentage of the Web, I'm happy > to add the list to the spec. Can you not do a survey on your large dataset of data to find this out? I read somewhere also that Adam Barth was able to add code to Google Chrome to figure out a better algorithm for Content-Type sniffing. Maybe something similar could be done here? We've encountered problems by the way with using the Unicode encoding matching algorithm. Particularly on some Asian sites. I think we need to switch HTML5 back to something more akin to WebKit/Gecko/Trident. I realize this means more magic lists, but the current algorithm does not seem to cut it. E.g. sites rely on the fact that EUC_JP is not a recognized encoding but EUC-JP is. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 5 August 2009 05:25:20 UTC