W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: Unicode distribution?

From: Erik van der Poel <erikv@google.com>
Date: Fri, 5 Jan 2007 15:59:35 -0800
Message-ID: <c07a32650701051559q5ec71a52ybfac393254dae9de@mail.gmail.com>
To: "John O'Conner" <John.Oconner@sun.com>
Cc: "iris garden" <iris2000sa@yahoo.com>, www-international@w3.org

You can find Mark's presentation "Unicode at Google" on the right side
of http://macchiato.com/

I gathered the numbers for Mark. Mid-2006, the top 5 HTML META
charsets in Google's index were:

42.8%  iso-8859-1
20.4%  utf-8
8.12%  gb2312
3.97%  windows-1252
3.76%  windows-1251

In 2001, that distribution was:

47.9%  iso-8859-1
13.5%  windows-1252
7.30%  gb2312
5.47%  shift_jis
4.65%  utf-8

In 2001, 43.5% of the HTML documents had a META charset, while in
2006, that percentage was 72%.

Erik

On 1/5/07, John O'Conner <John.Oconner@sun.com> wrote:
>
> iris garden wrote:
> > Hi
> >
> > I want to ask please about the Unicode (utf-8) distribution on the
> > Internet, i.e. any statistics that shows the percentage of websites
> > world-wide that uses Unicode compared to other types of encoding?
> >
> > Thanks
> > Iris
>
> I believe that Mark Davis (working with both the Unicode Consortium and
> Google) may have provided some of that information in his recent Unicode
> Conference session. You might want to find and look at that session's
> slide material.
>
> Regards,
> John O'Conner
Received on Saturday, 6 January 2007 04:53:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT