- From: Charles McCathieNevile <charles@w3.org>
- Date: Tue, 10 Apr 2001 06:56:02 -0400 (EDT)
- To: David Woolley <david@djwhome.demon.co.uk>
- cc: <w3c-wai-ig@w3.org>
There is also the original robot built by Henrik Frystyk Nielsen for the
libwww code library. You can teach that to understand Javascript links,
although it takes a bit of basic programming skill. And what Dave said about
having permission goes double for the robot - it is extremely efficient,
which means it is easily capable of urnning amok very fast.
http://www.w3.org/Robot/
Cheers
Charles
On Tue, 10 Apr 2001, David Woolley wrote:
> Sometimes, the link finding options that are built
> into Bobby are not enough to automatically generate a
> precise list of files for accessibility analysis. In
This should almost certainly be considered an accessibility
failure in its own right. The site is also a potential
commercial failure (although many sites would fail on this
criteria) as search engines may also not be able to find those
parts of the site!
Good practice for sites is to include a site map listing all the
static pages. Any site with a properly maintained site map should
be easy to navigate by such tools, even if a user would have to
go a long way out of their way to navigate using the same means.
> Can anyone recommend a tool that will allow me to
> produce a complete list of all the pages in a web
> site?
I believe this is theoretically impossible once you allow scripting
and Java etc. ("the halting problem"). More specifically, I doubt that
there are any tools that understand Microsoft HTML Help's ActiveX/Java
tree control parameter formats, or even the common idioms for
JavaScript popup pages.
I doubt that any tool can follow links that are implemented by
selecting from a pull down list, even when done completely server side
(this affectation is normally done client side, with scripting). Any
such links implemented with POST method forms would be dangerous to
follow. It's not possible to search the whole parameter space of
a more general form in order to trigger error pages, etc.
Both Lynx and wget are capable of building more or less complete lists
of links from pure, valid, HTML. I think wget uses a simplified parser,
so might get confused by unusual parameter syntax. Neither understand
scripting or attempt to submit forms with various parameters.
Lynx should only be used on your own site, or with explicit permission, as
it does not obey the protocols that allow a site to restrict the activity
of such automated tools, not does it pause between requests to avoid
overloading a site. The robots protocol should not be disabled in wget
without permission of the site owner, nor should the user agent string
be modified, to simulate another browser, in either without permission.
--
Charles McCathieNevile http://www.w3.org/People/Charles phone: +61 409 134 136
W3C Web Accessibility Initiative http://www.w3.org/WAI fax: +1 617 258 5999
Location: 21 Mitchell street FOOTSCRAY Vic 3011, Australia
(or W3C INRIA, Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France)
Received on Tuesday, 10 April 2001 06:56:23 UTC