W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: Unicode conference papers

From: Erik van der Poel <erik@vanderpoel.org>
Date: Tue, 21 Nov 2006 22:02:25 -0800
Message-ID: <638fb7f40611212202k75fdc755ue91b0004f33e4d18@mail.gmail.com>
To: "Richard Ishida" <ishida@w3.org>
Cc: "Mark Davis" <mark.davis@icu-project.org>, Unicode <unicode@unicode.org>, www-international@w3.org

Hi Richard,

The html meta charset numbers do not include xml encodings; I will
count those next time. I will also compare our detected charset and
language with the document's tags, but our language detector does not
detect very many languages yet, so the comparison may not be so

By the way, the http and meta content-language allow more than one
language to be specified. The most common pair of languages is de,at.
The next most common is fr,en.


On 11/21/06, Richard Ishida <ishida@w3.org> wrote:
> 3. Slide 20 (Charset tagging trends) seems to indicate that around 72% of HTML pages now contain encoding declarations in the meta tag.  Is that correct? (eg. Is the declaration for some pages in the xml declaration?)  That seems like a high number (though I'm not complaining).  I'm surprised that the HTTP header isn't at least as high, though, since I'd have thought that many servers are set up to serve a default encoding.  Do you have any explanation for that result?
> 4. It would be interesting to know what proportion of character encodings and language declarations shown are considered to be incorrect (presumably the graphs alluded to in question 3 include those).
Received on Wednesday, 22 November 2006 06:02:38 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:27 UTC