W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Page & validation statistics available

From: Nikita The Spider The Spider <nikitathespider@gmail.com>
Date: Tue, 8 Apr 2008 11:32:58 -0400
Message-ID: <35e76ac10804080832k72dbedf2m8b7b34288641cc3a@mail.gmail.com>
To: "olivier Thereaux" <ot@w3.org>
Cc: "Brian Wilson" <bloo@blooberry.com>, "W3C Validator Community" <www-validator@w3.org>

On Tue, Apr 8, 2008 at 8:44 AM, olivier Thereaux <ot@w3.org> wrote:
>  On Apr 7, 2008, at 09:47 , Nikita The Spider The Spider wrote:
> > As a result of our conversation on validation statistics last month, I
> > was inspired to collect some statistics based on the data my validator
> > Nikita sees. If you're interested in the topic, you can read about it
> > here:
> >
> > http://NikitaTheSpider.com/articles/ByTheNumbers/
> >
>  Very cool, I read it with a lot of interest. Thank you very much for
> sharing it here.

You are most welcome.

>  One thing I was wondering, do you have stats on what ratio of the pages you
> tested passed the validation?

Stupidly, I didn't collect statistics on this very basic concept. I
can't go back and collect them now since some of the data that I used
is gone. (As Nikita's disk space fills up, the oldest crawls go into
the bit bucket.) But I'm sure I'll do this again sometime and I'll
include that statistic. It might also be interesting to see the
percentage valid organized by doctype.

>  I recently made some tests (on rather small sets, a few hundreds at a time
> hence not necessarily reliable) of:
>  - finding URIs that had been validated within the last 24 hours
>  - looking at whether they were now valid
>  - seeing if there was a higher ratio of validity for pages that had been
> validated multiple times, as opposed to once.
>  I found that the ratio of valid pages (after 24 hours) was around 50% for
> "clients" of the w3c validators. Given the "modern" profile of nikita's
> clients (lots of utf8, lots of XHTML) I was wondering if it was similar, or
> higher.

I can't speak to that but I see quite a few repeat customers; i.e.
people who validate their site and then are back again soon with most
of the problems fixed.

Whole-site HTML validation, link checking and more
Received on Tuesday, 8 April 2008 15:40:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:55 UTC