W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Page & validation statistics available

From: olivier Thereaux <ot@w3.org>
Date: Tue, 8 Apr 2008 08:44:43 -0400
Cc: "Brian Wilson" <bloo@blooberry.com>, "W3C Validator Community" <www-validator@w3.org>
Message-Id: <F7C7DA99-BC45-49FF-A619-B17FEF516568@w3.org>
To: Nikita The Spider The Spider <nikitathespider@gmail.com>


On Apr 7, 2008, at 09:47 , Nikita The Spider The Spider wrote:
> As a result of our conversation on validation statistics last month, I
> was inspired to collect some statistics based on the data my validator
> Nikita sees. If you're interested in the topic, you can read about it
> here:
> http://NikitaTheSpider.com/articles/ByTheNumbers/

Very cool, I read it with a lot of interest. Thank you very much for  
sharing it here.

Indeed as you wrote in the report, there is a bias to such a study,  
but I think that the particular (website) population being sampled is  
very interesting to us, too. Having stats on "sites that try to  
validate" in addition to the work being done on "the wild web" by e.g  
Brian, is very welcome.

One thing I was wondering, do you have stats on what ratio of the  
pages you tested passed the validation?

I recently made some tests (on rather small sets, a few hundreds at a  
time hence not necessarily reliable) of:
- finding URIs that had been validated within the last 24 hours
- looking at whether they were now valid
- seeing if there was a higher ratio of validity for pages that had  
been validated multiple times, as opposed to once.

I found that the ratio of valid pages (after 24 hours) was around 50%  
for "clients" of the w3c validators. Given the "modern" profile of  
nikita's clients (lots of utf8, lots of XHTML) I was wondering if it  
was similar, or higher.

Thanks,
-- 
olivier
Received on Tuesday, 8 April 2008 12:45:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT