- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 27 Aug 2008 10:52:26 +0100
- To: "'Erik van der Poel'" <erikv@google.com>
- Cc: "'HTML WG'" <public-html@w3.org>, <www-international@w3.org>
Hi Erik, Thanks for the useful update. I think it's quite significant that there is a 13% increase in the use of lang versus a 3% increase in the use of meta in the same period. Any way to tell what percentage of pages use both lang and xml:lang at the same time for each of these figures? That would then also give us a total figure for use of attributes. Also what percentage of pages using attributes also use meta Content-Language? Cheers, RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/International/ http://rishida.net/ > -----Original Message----- > From: Erik van der Poel [mailto:erikv@google.com] > Sent: 26 August 2008 01:32 > To: Martin Duerst > Cc: Henri Sivonen; Richard Ishida; Ian Hickson; HTML WG; www- > international@w3.org > Subject: Re: meta content-language > > On Thu, Aug 21, 2008 at 7:16 PM, Martin Duerst <duerst@it.aoyama.ac.jp> wrote: > > At 16:36 08/08/15, Henri Sivonen wrote: > >>Of course, if the data is *wrong* significantly more often than > >>lang='' (assuming that the correctness level of lang='' establishes an > >>implicit data quality baseline), it would be good to ignore it. My > >>guess is that HTTP-level Content-Language is more likely to be wrong > >>(it sure is less obvious to diagnose) than any HTML-level declaration. > >>(Due to Ruby's Postulate: > >>http://intertwingly.net/slides/2004/devcon/68.html ) > > > > I guess Google might be able to come up with some data. > > I have copied Erik van der Poel, an expert in this area. > > > > My guess is that: > > - Authors who declare something usually use lang/xml:lang, > > and meta maybe as an addition. > > - Some tools may use meta, but the chance that the author > > corrects this if necessary is low (this is different from > > the charset case, because the charset case is very > > visible/actionable). > > >From 2001 to 2007, <html lang="..."> usage increased from 2% to 15% of > HTML documents in Google's index, while <html xml:lang="..."> usage > increased from 0.4% to 9% in the same period. > > On the other hand, <meta http-equiv=Content-Language content=...> > usage increased from 5% to 8%, while HTTP Content-Language increased > from 1% to 6%. > > I don't know how many of the declared languages are "wrong", but I can > compare them with our language detector's result, for the languages > that are supported by our detector. For <html lang="...">, 13.0% were > different. For the meta Content-Language, 11.4% were different, while > for HTTP Content-Language, 11.0% were different. (These numbers are > quite similar, so I don't know whether we can speak of a Ruby effect.) > > Many of the differences for <html lang="..."> were for documents that > had lang="en" while our detector returned a different result. Perhaps > "en" is the default value, and is not being modified by > authors/admins. > > Erik
Received on Wednesday, 27 August 2008 09:53:03 UTC