- From: Nils Ulltveit-Moe <nils@u-moe.no>
- Date: Tue, 19 Apr 2005 11:16:08 +0200
- To: Paul Walsh <paul.walsh@segalamtest.com>
- Cc: 'Charles McCathieNevile' <charles@sidar.org>, 'Giorgio Brajnik' <giorgio@dimi.uniud.it>, public-wai-ert@w3.org
Hi Paul, man, 18,.04.2005 kl. 23.30 +0100, skrev Paul Walsh: > Ok I've sat back and read most peopleĒs thoughts on this subject and > would now like to ask a question of those who believe we should > include a confidence level. I personally still think this is a bad > idea for all the same reasons I stated in my original email. I feel > priority and/or severity levels are the most widely used and > understood mandatory fields in a defect tracking tool and even then > they are almost always misused at least once on any given project when > working with external parties outside of your control - especially by > 'developers' who think they have the aptitude of a test analyst, but > do not. Introducing a confidence level will simply make defect report > writing and evaluation more time consuming. You can argue until the > pigs come home, but we will not use a confidence level in our > reporting. I agree that EARL should be able to convey priority levels. Priority levels would indicate how important different checkpoints are, and for accessibility tests, that would somehow be related to how big impact a failed test would have for a disabled user. For priority levels, we need to write an RDF/OWL schema that defines the scale that is used for the priority in a nonambiguous way; i.e. using a W3C convention for priority scales identifying which is the high and which is the low priority. In a textual representation this would be something like: WCAG priority scale consists of 3 levels, wcag-priority1, wcag-priority2 and wcag-priority3, where wcag-priority1 is the highest priority. If needed, there may also be tied number values to the priories, alternatives the priorities may be defined as numbers, if we want to. (i.e. 1,2 and 3 where 1 is highest priority). One challenge with priority levels, is that different groups of disabled have different viewpoint of which tests that are important. For a blind user, checkpoint 1.1 for alternative text is important. 1.1 is not important for a deaf user, however missing texting of a video clip may be a barrier for the deaf user, which that user would consider important. So this means that we should co-operate with disabled organisations on defining a set of priorities for each group. Since we should not discriminate between disabled users, this means that the priority of a checkpoint would somehow be related to the priority of the group of disabled that has the checkpoint highest prioritised; i.e. if cp1.1 is priority 1 for a blind user and priority 3 for a user with vision, then the checkpoint should be measured as priority 1 in order of not discriminating a blind user in favor of a deaf. Confidence values, however, is another thing, that is not related to priorities. Confidence values shows how confident an automatic assessment tool is that a checkpoint can be categorised as either Pass, Fail, CannotTell, NotApplicable etc. This is especially useful for knowledge based systems that has learnt to categorise accessibiliy issues by example. If the system comes across an issue it has been taught that is a Fail several times, it will be quite confident that this is a real issue, if that pattern occurs again. If a similar, but a bit different pattern occurs, then the system may still say that this is a Fail, but with less confidence. You have the same problem with manual assessments as well. An inexperienced accessibility tester will perform tests with less confidence than an experienced tester. Especially in cases where the tester is in doubt, or cases the tester has not experienced before. Confidence values should be defined as an optional parameter, since I appreciate that not all vendors may want to make use of it. However, we plan to use it for automatic assessments, and may also experiment with using it for manual assessments, and I think it is important that EARL, as a machine readable format, is able to convey that information. > Q: > > For companies who use disabled users; how do you suggest they measure > the confidence level of their output? (I'm not assuming you can cover > every disability across every project or even one project - but it's > compulsory in my opinion to 'try', using the usual quality triangle to > ensure testing is cost/time effective). As I said above, the confidence level is not a priority of the importance of a checkpoint. Is is an indication of how certain or uncertain the auditor is in his/her/its decision. (It in the case of automatic assessment) > Situation: Dyslexic user is provided with high-level test case > scenarios, where auditors drill down further with detailed documented > test scripts using both manual and automated methods. The dyslexic > user has a problem with the complexity of the copy in two areas of the > website. This type of defect is not picked up by the auditor or the > tool, nor is it appreciated by the auditor. How do you measure the > confidence level of those two defects? This would mean that the tool and the auditor had chosen #Pass for a checkpoint that is indeed a problem. If the auditor or machine is going to learn from this fault, then they need feedback from the user who experienced the problem. They would have to learn the case the dyslectic person describes. If an automatic system or a manual assessor got such feedback, then they would learn from this, and lower their confidence in the decision they took, and eventually switch over to #Fail if this issue happened several times. (I.e. the assessor got confident that this issue was a problem.) > We have some of the most highly skilled and experienced test analysts > and developers who have worked for companies such as AOL since 1994 > and were responsible for the entire test management and execution and > International beta coordination of all new client software and > technology for the UK and Sweden whilst providing ongoing support to > Germany and France - trust me when I say they are more experienced > than most when it comes to 'testing' Internet technologies. I appreciate that. With such a profile your testers would most probably be quite confident in their decisions, and if you are 100% confident that an accessibility issue is real, then the extra confidence value is not needed. (i.e. the default value for confidence, if it is left out, is 1). > We use both manual and automated testing methods where the former > outweighs the latter by a long way. If someone is less than certain > about the output of their test they will always seek a second opinion > from their colleagues. This is why itĒs absolutely necessary to have a > team of auditors on any project. Each personĒs interpretation of an > outcome is debated until they come to an agreement. The combined > interpretation may not be 100% accurate if compared to that of a > disabled user (or even someone outside the company), but at least they > are 100% confident in the recorded defect. Anything less than this is > not good enough. Yes, and this describes why you do not need the confidence parameter, since it defaults to a probability of 1 (or 100%). We are doing quite different measurements. We will be trying to do automatic assessments of a large number of sites (several thousand) regularly. We will need to do some manual testing, and will base our tests largely on automatic assessments. In our case we need to base ourself on probability theory and best practices in statistics to reach numbers that approximate the perceived accessibility over a large number of assessments, to make it feasible. Regards, -- Nils Ulltveit-Moe <nils@u-moe.no>
Received on Tuesday, 19 April 2005 09:11:51 UTC