Re: crawling validator from David Woolley on 2002-05-16 (w3c-wai-ig@w3.org from April to June 2002)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Thu, 16 May 2002 08:00:36 +0100 (BST)
To: w3c-wai-ig@w3.org
Message-Id: <200205160700.g4G70as03085@djwhome.demon.co.uk>

> 
> is there a w3c service that crawls a site and reports errors, in 
> planning perhaps?

That's best done with a local tool.  A W3C service could easily be
used as a denial of service attack aid.

You can use Lynx to build a complete contents lists for a site (assuming
that links aren't hidden behind Javascript, etc., meaning that many search
engines will ignore them as well), then feed them into nsgmsls to validate
the HTML, or the CSS2 validator.

DO NOT do this without the site owner's permission, as Lynx doesn't 
obey the "robots" protocol, so will crawl where it is not allowed to
go, and will not pause the required 30 seconds between pages for 
unsolicited crawlers.  Abuse of Lynx may get it blacklisted by the site
as a hostile crawler.

You can also mirror the site using wget, which does respect the "robots"
protocol, then validate the local copy.

Received on Thursday, 16 May 2002 16:37:27 UTC