W3C home > Mailing lists > Public > www-validator@w3.org > October 2001

Re: Over 1500 invalid pages at www.w3.org

From: Nick Kew <nick@webthing.com>
Date: Mon, 15 Oct 2001 13:16:23 +0100 (BST)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
cc: Karl Dubost <karl@w3.org>, www-validator@w3.org, www-qa@w3.org
Message-ID: <Pine.BSF.4.21.0110151254540.447-100000@fenris.webthing.com>

On Mon, 15 Oct 2001, Bjoern Hoehrmann wrote:

> * Nick Kew wrote:
> >Site Valet reports 5322 HTML pages at W3C, so that's nearly 30% invalid.
> 
> I think there are way more pages than 5322, even if you count only the
> publically available pages.

Yes there are - it's still spidering them.  GetAgent will never
send more than one hit per minute to any one server, so it cannot
deal with more than 1440 www.w3.org docs in a day.

I took a more detailed look at the database after posting, and found it
had about 25000 www.w3.org URLs flagged as unvisited (though many of them
are non-HTML, so it'll only send a HEAD request to verify them).  I've no
doubt there will be more as it follows links in further pages.

The point of citing the number when I did is that it gives a proportion:
30% of a (substantial) sample proved to be invalid.

Actually I just re-read Gerald's post that induced me to start spidering
www.w3.org, and I find I misread what he wrote in the first place :-(

-- 
Nick Kew

Site Valet - the essential service for anyone with a website.
<URL:http://valet.webthing.com/>
Received on Monday, 15 October 2001 13:01:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:00 GMT