Re: crawling validator

On Thu, 16 May 2002, David Woolley wrote:

> >
> > is there a w3c service that crawls a site and reports errors, in
> > planning perhaps?
>
> That's best done with a local tool.  A W3C service could easily be
> used as a denial of service attack aid.

The key point is that any crawler should operate slowly so as not to
risk overloading a server.  One page per minute is a common
rule-of-thumb for well-behaved robots.  This is obviously not compatible
with an online service that spiders while you wait.

> You can also mirror the site using wget, which does respect the "robots"
> protocol, then validate the local copy.

wget runs on rapid-fire too.

The Site Valet spider does exactly what you're asking for, spidering
a site over time and compiling results which can be emailed to you,
queried online with a browser, or both.

-- 
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Received on Thursday, 16 May 2002 16:56:35 UTC