- From: Erik van der Poel <erik@vanderpoel.org>
- Date: Tue, 21 Nov 2006 22:02:25 -0800
- To: "Richard Ishida" <ishida@w3.org>
- Cc: "Mark Davis" <mark.davis@icu-project.org>, Unicode <unicode@unicode.org>, www-international@w3.org
Hi Richard, The html meta charset numbers do not include xml encodings; I will count those next time. I will also compare our detected charset and language with the document's tags, but our language detector does not detect very many languages yet, so the comparison may not be so meaningful. By the way, the http and meta content-language allow more than one language to be specified. The most common pair of languages is de,at. The next most common is fr,en. Erik On 11/21/06, Richard Ishida <ishida@w3.org> wrote: > > 3. Slide 20 (Charset tagging trends) seems to indicate that around 72% of HTML pages now contain encoding declarations in the meta tag. Is that correct? (eg. Is the declaration for some pages in the xml declaration?) That seems like a high number (though I'm not complaining). I'm surprised that the HTTP header isn't at least as high, though, since I'd have thought that many servers are set up to serve a default encoding. Do you have any explanation for that result? > > 4. It would be interesting to know what proportion of character encodings and language declarations shown are considered to be incorrect (presumably the graphs alluded to in question 3 include those).
Received on Wednesday, 22 November 2006 06:02:38 UTC