- From: Karl Groves <karl@karlgroves.com>
- Date: Mon, 4 Oct 2010 10:32:35 -0400
- To: Salinee Kuakiatwong <salinee20@gmail.com>
- Cc: w3c-wai-ig@w3.org
Salinee, Right now there is a big shift in the way automated testing is performed. Currently if you were to do an inventory of all of the automated testing tools out there (free and non-free) you'll find four things which will create differences between one tool and the next. 1) Standards Support: Tools which only support WCAG 1.0 vs. those which support WCAG 2.0 and; 2) What They Test: Tools which test the document source as a string vs. tools which test the DOM 3) How They Handle Subjective Guidelines: in other words, what they do with a guideline that takes human judgement 4) Report Clarity: how clear the reports, including individual findings, are Under criteria #1, typically what you'll find with "WCAG 1.0/ Section 508-only" tools is that they're old. Their testing rules are out of date and they're not under active development anymore. Throw these away. Luckily, I don't think there are many "enterprise" automated testing tools these days that don't support WCAG 2.0. This second criteria, however, is a big deal. Early in the days of automated testing tools what you'd have basically, is tools which tested document source as it was sent by the server. I call this "as a string" because all they'd do is grab the source (just like you "View Source" in your browser), parse it to create a multidimensional array of elements, loop through them, running a bunch of tests and generating reports. This is (usually) just fine if what you're testing a completely static document with no client-side interactivity or other scripted DOM manipulation. The big problem with such an approach is that in order to be an accurate test, an automated tool must test what the end user is getting. That means, in the case of pages with client-side scripting, the testing tool needs to get the modified version (after the scripting has changed the DOM) to test. So, depending upon what the tool tests, you could get significantly different results simply because they're testing different things altogether. The next issue is how the tools handle guidelines which require subjective interpretation. In my own personal research, I've found that there are only 16 basic types of accessibility tests. For the sake of brevity I won't list them here, but they're things like "Element ____ contains attribute ______" or "Element _____ has child element ______". Using this type of structure (and a tool which can test the DOM), you can generate hundreds of tests. *Some* of those tests are rather absolute in nature. For instance the classic test for whether an image has an alt attribute or not is an example of a test that any tool can perform and report on accurately and clearly. It is when you get into subjective interpretation that you begin to see wildly different results from automated tools. For instance when testing for WCAG 2.0 Guideline 1.1.1 we're not just testing for the existence of an alt attribute. The guideline discusses many possible situations that each will require their own different type of alternate text - some (most) of which simply cannot be tested with any degree of certainty using automated testing. In such cases, the best that can be done is to generate a "warning" signifying that something *might* be wrong. In other cases, it might even be prudent to not test for every success criteria at all. You will see major differences in what each tool tests and how they test it when it comes to subjective tests. For instance, some may have tests against some specified string length threshold for img alt attributes and generate an error if it is too short or too long. Clearly there are instances where a short alt attribute is appropriate and only human subjectivity can determine if that is really an issue, so some may call this a "warning" and not an error. Last is the issue of report clarity. So I've already listed that tools may test different things (string vs. DOM), may test more (or less) thoroughly, and may include issues that require subjective interpretation. On top of it all, the reports that each tool gives you may vary significantly in how clear the report is. Each tool may use different nomenclature than the next. Further, their explanation of issues may also be different - and the more subjective the test result, the more different the issue description can be. What this means for your research is that both tools might have found a particular issue but they report them differently. Be sure to read the reports carefully when doing your comparison. As Chaals stated, you'll want to do a manual evaluation yourself. I'd recommend creating a web page (or, even better a whole site) filled with errors, create a list of those errors as your test criteria and then test automatically with each tool to see how they perform. Best of luck to you, Karl On Mon, Oct 4, 2010 at 4:14 AM, Salinee Kuakiatwong <salinee20@gmail.com> wrote: > Dear All, > I'm writing a research paper to investigate the inter-reliability of > automated evaluation tools. I used two automated web evaluation tools to > scan the same web pages. The findings indicates there are highly > discrepancies in the results between both tools although they're based on > the same standard (WCAG 2.0). > I'm new to the field. Any explanation for such a case? > Thanks!
Received on Monday, 4 October 2010 14:55:06 UTC