Re: crawling validator from Nick Kew on 2002-05-16 (w3c-wai-ig@w3.org from April to June 2002)

From: Nick Kew <nick@webthing.com>
Date: Thu, 16 May 2002 21:56:30 +0100 (BST)
To: David Woolley <david@djwhome.demon.co.uk>
cc: <w3c-wai-ig@w3.org>
Message-ID: <20020516215016.L1683-100000@fenris.webthing.com>

On Thu, 16 May 2002, David Woolley wrote:

> >
> > is there a w3c service that crawls a site and reports errors, in
> > planning perhaps?
>
> That's best done with a local tool.  A W3C service could easily be
> used as a denial of service attack aid.

The key point is that any crawler should operate slowly so as not to
risk overloading a server.  One page per minute is a common
rule-of-thumb for well-behaved robots.  This is obviously not compatible
with an online service that spiders while you wait.

> You can also mirror the site using wget, which does respect the "robots"
> protocol, then validate the local copy.

wget runs on rapid-fire too.

The Site Valet spider does exactly what you're asking for, spidering
a site over time and compiling results which can be emailed to you,
queried online with a browser, or both.

-- 
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Received on Thursday, 16 May 2002 16:56:35 UTC