Re: Bobby Limitatation - Workaround Sought from Phill Jenkins on 2001-04-10 (w3c-wai-ig@w3.org from April to June 2001)

From: Phill Jenkins <pjenkins@us.ibm.com>
Date: Tue, 10 Apr 2001 19:18:30 -0400
To: w3c-wai-ig@w3.org
Message-ID: <OF12A624F3.4A882FE2-ON86256A2A.00798D8F@raleigh.ibm.com>
Graham_oliver wrote, quoting from Bobby Help:
> Sometimes, the link finding options that are built
> into Bobby are not enough to automatically generate a
> precise list of files for accessibility analysis. In

david responded:
>This should almost certainly be considered an accessibility
>failure in its own right.  The site is also a potential
>commercial failure (although many sites would fail on this
>criteria) as search engines may also not be able to find those
>parts of the site!

Phill responds:
PJ: I don't understand the relationship with Bobby's capability, or any
other tool's capability for that matter, to crawl a site to build a list of
pages to analyze and accessibility.  The pages can be perfectly accessible
with adaptive technology and a browser, Bobby not being able to get the URL
is not an accessibility issue.  Nor is this related to <quote> commercial
failure <unquote> because some search engines couldn't find some parts of a
site.  Many many pages served to your browser don't exist anywhere until
they show up in the browser.

david continued:
>Good practice for sites is to include a site map listing all the
>static pages.  Any site with a properly maintained site map should
>be easy to navigate by such tools, even if a user would have to
>go a long way out of their way to navigate using the same means.

Phill responds:
Site maps, which I highly recommend, also won't solve this problem because
they are only an outline of the site.  For example, there isn't a site map
that includes the error pages I get when visiting my on-line banking site
when I accidentally type in my password on my wife's account number, but
the page is perfectly accessible.  I would never expect a site map to
include every possible combination of pages I could get, what use would
that be?  Not to mention that if a site map *can* be created, then Bobby
certainly will find those pages.


graham_oliver continued:
> Can anyone recommend a tool that will allow me to
> produce a complete list of all the pages in a web
> site?

david responded:
>I believe this is theoretically impossible once you allow scripting
>and Java etc. ("the halting problem").  More specifically, I doubt that
>there are any tools that understand Microsoft HTML Help's ActiveX/Java
>tree control parameter formats, or even the common idioms for
>JavaScript popup pages.

>I doubt that any tool can follow links that are implemented by
>selecting from a pull down list, even when done completely server side
>(this affectation is normally done client side, with scripting).  Any
>such links implemented with POST method forms would be dangerous to
>follow.  It's not possible to search the whole parameter space of
>a more general form in order to trigger error pages, etc.

Phill responds:
Actually there is such a tool, although not for sale yet.  As I explained
at the face-to-face Interest group meeting at CSUN last month, IBM has a
"crawler" or "miner" as we call it that actually gets most every page,
including many (most?) dynamically built pages, especially those created by
JavaScript pull downs, Lotus Notes Domino, and some others.  The it saves a
snap shot of that page on a server, creates a temporary URL to the page,
and loads Bobby with the list of URLs to analyze.  It scales very nicely.
IBM is actually using this to analyze our millions of internal web pages
for accessibility.  The tool can also check other things, for example does
it have the required privacy statement/links, use of correct logos, etc.

By the way, Bobby does do a very good job by itself in crawling many sites.
The original Bobby Help quote is just being honest and complete, something
that is usually not part of all product help documentation.  One could
always save the list of URLs that Bobby found and analyzed, and then
manually add some pages to the list for it to check.

Phill Jenkins
IBM Research Division - Accessibility Center
Received on Tuesday, 10 April 2001 19:19:50 UTC