W3C home > Mailing lists > Public > whatwg@whatwg.org > July 2009

[whatwg] Spec comments, sections 1-2

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Wed, 29 Jul 2009 12:34:55 -0400
Message-ID: <7c2a12e20907290934v4ea8d87exfbdacaad650f92d@mail.gmail.com>
On Wed, Jul 29, 2009 at 4:39 AM, Ian Hickson<ian at hixie.ch> wrote:
> There is value in not changing them unless they are actually broken --
> when I edit the spec, there's always a risk I'll break something.

Okay, not a big deal then.

> I've required UAs to catch this case and added this example.

Okay, great.

> Which others are needed for compatibility?

I don't know, but there are certainly some.  Otherwise, why would
browsers support so many?  For instance, baidu.com is #9 on Alexa and
serves gb2312 as far as I can tell.  So does qq.com, which is #14.
And sina.com.cn, #19.  vkontakte.ru is #30 and serves Windows-1251.
tudou.com (#60) uses gbk.  rakuten.co.jp (#68) serves EUC-JP.

This is just from a quick manual look at a few of the largest
non-English sites.  I'd think it would be fairly easy for someone
(e.g., Google) to come up with a rough summary of character encoding
usage on the web by percentage, and for vendors to say which encodings
they support, so a useful common list could be worked out.

If browsers differ in which encodings they accept, that harms
interoperability, so I'd think it would be ideal if HTML 5 would
specify the exact list of encodings that must be supported and
prohibited support for any others.  The union of encodings supported
by existing browsers would be a reasonable start, since supporting a
new encoding is presumably pretty cheap.  Unless this is viewed as
outside the scope of HTML 5 -- e.g., if browsers tend to rely on the
operating system for encoding support.
Received on Wednesday, 29 July 2009 09:34:55 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:14 UTC