- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Wed, 29 Jul 2009 12:34:55 -0400
On Wed, Jul 29, 2009 at 4:39 AM, Ian Hickson<ian at hixie.ch> wrote: > There is value in not changing them unless they are actually broken -- > when I edit the spec, there's always a risk I'll break something. Okay, not a big deal then. > I've required UAs to catch this case and added this example. Okay, great. > Which others are needed for compatibility? I don't know, but there are certainly some. Otherwise, why would browsers support so many? For instance, baidu.com is #9 on Alexa and serves gb2312 as far as I can tell. So does qq.com, which is #14. And sina.com.cn, #19. vkontakte.ru is #30 and serves Windows-1251. tudou.com (#60) uses gbk. rakuten.co.jp (#68) serves EUC-JP. This is just from a quick manual look at a few of the largest non-English sites. I'd think it would be fairly easy for someone (e.g., Google) to come up with a rough summary of character encoding usage on the web by percentage, and for vendors to say which encodings they support, so a useful common list could be worked out. If browsers differ in which encodings they accept, that harms interoperability, so I'd think it would be ideal if HTML 5 would specify the exact list of encodings that must be supported and prohibited support for any others. The union of encodings supported by existing browsers would be a reasonable start, since supporting a new encoding is presumably pretty cheap. Unless this is viewed as outside the scope of HTML 5 -- e.g., if browsers tend to rely on the operating system for encoding support.
Received on Wednesday, 29 July 2009 09:34:55 UTC