Re: Page & validation statistics available

On Tue, Apr 8, 2008 at 8:44 AM, olivier Thereaux <ot@w3.org> wrote:
>
>  On Apr 7, 2008, at 09:47 , Nikita The Spider The Spider wrote:
>
> > As a result of our conversation on validation statistics last month, I
> > was inspired to collect some statistics based on the data my validator
> > Nikita sees. If you're interested in the topic, you can read about it
> > here:
> >
> > http://NikitaTheSpider.com/articles/ByTheNumbers/
> >
>
>  Very cool, I read it with a lot of interest. Thank you very much for
> sharing it here.

You are most welcome.

>  One thing I was wondering, do you have stats on what ratio of the pages you
> tested passed the validation?

Stupidly, I didn't collect statistics on this very basic concept. I
can't go back and collect them now since some of the data that I used
is gone. (As Nikita's disk space fills up, the oldest crawls go into
the bit bucket.) But I'm sure I'll do this again sometime and I'll
include that statistic. It might also be interesting to see the
percentage valid organized by doctype.


>  I recently made some tests (on rather small sets, a few hundreds at a time
> hence not necessarily reliable) of:
>  - finding URIs that had been validated within the last 24 hours
>  - looking at whether they were now valid
>  - seeing if there was a higher ratio of validity for pages that had been
> validated multiple times, as opposed to once.
>
>  I found that the ratio of valid pages (after 24 hours) was around 50% for
> "clients" of the w3c validators. Given the "modern" profile of nikita's
> clients (lots of utf8, lots of XHTML) I was wondering if it was similar, or
> higher.

I can't speak to that but I see quite a few repeat customers; i.e.
people who validate their site and then are back again soon with most
of the problems fixed.

-- 
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Received on Tuesday, 8 April 2008 15:40:03 UTC