- From: Nick Kew <nick@webthing.com>
- Date: Mon, 15 Oct 2001 13:16:23 +0100 (BST)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: Karl Dubost <karl@w3.org>, www-validator@w3.org, www-qa@w3.org
On Mon, 15 Oct 2001, Bjoern Hoehrmann wrote: > * Nick Kew wrote: > >Site Valet reports 5322 HTML pages at W3C, so that's nearly 30% invalid. > > I think there are way more pages than 5322, even if you count only the > publically available pages. Yes there are - it's still spidering them. GetAgent will never send more than one hit per minute to any one server, so it cannot deal with more than 1440 www.w3.org docs in a day. I took a more detailed look at the database after posting, and found it had about 25000 www.w3.org URLs flagged as unvisited (though many of them are non-HTML, so it'll only send a HEAD request to verify them). I've no doubt there will be more as it follows links in further pages. The point of citing the number when I did is that it gives a proportion: 30% of a (substantial) sample proved to be invalid. Actually I just re-read Gerald's post that induced me to start spidering www.w3.org, and I find I misread what he wrote in the first place :-( -- Nick Kew Site Valet - the essential service for anyone with a website. <URL:http://valet.webthing.com/>
Received on Monday, 15 October 2001 13:01:29 UTC