- From: Charles McCathieNevile <charles@w3.org>
- Date: Tue, 10 Apr 2001 06:56:02 -0400 (EDT)
- To: David Woolley <david@djwhome.demon.co.uk>
- cc: <w3c-wai-ig@w3.org>
There is also the original robot built by Henrik Frystyk Nielsen for the libwww code library. You can teach that to understand Javascript links, although it takes a bit of basic programming skill. And what Dave said about having permission goes double for the robot - it is extremely efficient, which means it is easily capable of urnning amok very fast. http://www.w3.org/Robot/ Cheers Charles On Tue, 10 Apr 2001, David Woolley wrote: > Sometimes, the link finding options that are built > into Bobby are not enough to automatically generate a > precise list of files for accessibility analysis. In This should almost certainly be considered an accessibility failure in its own right. The site is also a potential commercial failure (although many sites would fail on this criteria) as search engines may also not be able to find those parts of the site! Good practice for sites is to include a site map listing all the static pages. Any site with a properly maintained site map should be easy to navigate by such tools, even if a user would have to go a long way out of their way to navigate using the same means. > Can anyone recommend a tool that will allow me to > produce a complete list of all the pages in a web > site? I believe this is theoretically impossible once you allow scripting and Java etc. ("the halting problem"). More specifically, I doubt that there are any tools that understand Microsoft HTML Help's ActiveX/Java tree control parameter formats, or even the common idioms for JavaScript popup pages. I doubt that any tool can follow links that are implemented by selecting from a pull down list, even when done completely server side (this affectation is normally done client side, with scripting). Any such links implemented with POST method forms would be dangerous to follow. It's not possible to search the whole parameter space of a more general form in order to trigger error pages, etc. Both Lynx and wget are capable of building more or less complete lists of links from pure, valid, HTML. I think wget uses a simplified parser, so might get confused by unusual parameter syntax. Neither understand scripting or attempt to submit forms with various parameters. Lynx should only be used on your own site, or with explicit permission, as it does not obey the protocols that allow a site to restrict the activity of such automated tools, not does it pause between requests to avoid overloading a site. The robots protocol should not be disabled in wget without permission of the site owner, nor should the user agent string be modified, to simulate another browser, in either without permission. -- Charles McCathieNevile http://www.w3.org/People/Charles phone: +61 409 134 136 W3C Web Accessibility Initiative http://www.w3.org/WAI fax: +1 617 258 5999 Location: 21 Mitchell street FOOTSCRAY Vic 3011, Australia (or W3C INRIA, Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France)
Received on Tuesday, 10 April 2001 06:56:23 UTC