[whatwg] Spec comments, sections 1-2 from Aryeh Gregor on 2009-07-29 (public-whatwg-archive@w3.org from July 2009)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Wed, 29 Jul 2009 12:34:55 -0400
Message-ID: <7c2a12e20907290934v4ea8d87exfbdacaad650f92d@mail.gmail.com>

On Wed, Jul 29, 2009 at 4:39 AM, Ian Hickson<ian at hixie.ch> wrote:
> There is value in not changing them unless they are actually broken --
> when I edit the spec, there's always a risk I'll break something.

Okay, not a big deal then.

> I've required UAs to catch this case and added this example.

Okay, great.

> Which others are needed for compatibility?

I don't know, but there are certainly some.  Otherwise, why would
browsers support so many?  For instance, baidu.com is #9 on Alexa and
serves gb2312 as far as I can tell.  So does qq.com, which is #14.
And sina.com.cn, #19.  vkontakte.ru is #30 and serves Windows-1251.
tudou.com (#60) uses gbk.  rakuten.co.jp (#68) serves EUC-JP.

This is just from a quick manual look at a few of the largest
non-English sites.  I'd think it would be fairly easy for someone
(e.g., Google) to come up with a rough summary of character encoding
usage on the web by percentage, and for vendors to say which encodings
they support, so a useful common list could be worked out.

If browsers differ in which encodings they accept, that harms
interoperability, so I'd think it would be ideal if HTML 5 would
specify the exact list of encodings that must be supported and
prohibited support for any others.  The union of encodings supported
by existing browsers would be a reasonable start, since supporting a
new encoding is presumably pretty cheap.  Unless this is viewed as
outside the scope of HTML 5 -- e.g., if browsers tend to rely on the
operating system for encoding support.

Received on Wednesday, 29 July 2009 09:34:55 UTC