- From: Detlev Fischer <fischer@dias.de>
- Date: Wed, 24 Aug 2011 13:30:01 +0200
- To: public-wai-evaltf@w3.org
Am 24.08.2011 11:54, schrieb Shadi Abou-Zahra: > Hi Detlev, > > On 24.8.2011 10:43, Detlev Fischer wrote: >> Am 24.08.2011 02:11, schrieb Vivienne CONWAY: >>> Thanks Detlev. >>> >>> As I will be looking at these sites regularly (primarily with automated >>> tools due to the number of sites), what is puzzling me most is how to >>> actually score them. While I like the pass/fail/near approach for the >>> site owner's use, to compare them I need percentages. Such as: >>> P: 65% >>> O: 80% >>> U: 25% >>> R: 90% >>> Overall: 65% >>> >> Confronted with a large nuber of sites, one solution is of course to use >> automated toos. The problem, as we all know, is that many serious >> problems are not caught by them, and in turn, the percentage values >> suggest a kind of accuracy that is not really backed by the full >> evidence but simply based on just those issues amenable to automatic >> testing. >> >> We know many people are quite happy when they get a nice score or chart, >> they don't understand or even want to know how shaky these may be. I >> just think that *if* you can influence *how* some aggregate score is >> computed, the guiding principle would be for it to reflect the actual >> difficulty people have in accessing a site. >> >> Maybe it's a corny example but let's compare a site to a car. You can >> make all sorts of checks, engine and breaks and mirrors and body and >> interior are all fine, but if one critical thing (steering, >> transmission) fails, you can't use the car. (Think of visual CAPTCHA for >> blind people, or keyboard trap for keyboard users). The risk is that >> *any* aggregation of scores, even weighted scores, will 'drown' critical >> failures. >> >> So I believe that if there is no time for detailed testing, testing for >> critical failures is still more relevant than creating an automated test >> score. If you have both, even better. >> >> If it was indeed possible to decide on a limited number of >> evidence-based critical failures (say, 10), i.e., frequently observed >> aspects without which a site would be unusable or very hard to use for >> some populations, you could probably also compile a numeric outcome, >> summing up 1 or 0 per issue covered. In this case however, if you'd have >> less than 10 out of 10, its time for service... It's rough and simple, >> but that reflects the coarseness of approach and seems therefore >> adequate. > > Interesting thought. > > Besides the difficulty of defining "critical failures", could this on > the long-run lead to developers only aiming to fulfill these minimum > issues and leave out other important ones? Yes, I think that is a risk. How much will developers adapt to an ecosystem where only critical failures are noted? However, if a full scale evaluation must be ruled out due to time or humanpower constrains, a menaningful 'quick check' is still better than one that misses critical issues. (BTW, developers may also adapt to automatic testing ecosystem: ensure stuff validates, alt is present even if meaningless, headings are nested without gap even if counter-intuitive, etc) > > Can the set of "critical failures" be defined to be WCAG .2.0 Level A? Well, it would be nice to use Level A and on the whole, it may work. It would also be better because it does not create additional schemes beyond or above WCAG. However, it appears to me that some A-level criteria such as 1.4.1 Use of Color are in practice usually not *that* critical, while some AA-Level criteria are absolutely critical, such as 2.4.7 Focus Visible for sighted keyboard users. (Of course, as in other cases, workarounds often exist: users may choose another UA or an add-on that highlights focus regardless of CSS rules; CAPTCHAS can be solved with Webvisum, etc.) > > >> On another note, I also wonder what people would do with the break-down >> of percentages in the POUR schema. On some level, the results should be >> actionable and to be so, it would be nice to be be able to point to the >> most glaring problems. Maybe it is more useful to know what section of >> the population with disabilites is badly served. So that would suggest a >> scheme where you group things by >> >> M: Motor impairment >> B: Blindness >> V: Visual impairment >> H: Hearing impairment >> C: Cognitive impairment >> >> Success criteria may then be allocated to M,B,V,H,C and you would have >> many double allocations, for example, between M and B. >> >> Maybe critical failures could be allocated to populations (incl. double >> counts). I just run through the improvised list of 10 critical failures >> I made up earlier and add up: >> >> M: 4 >> B: 6 >> V: 4 >> H: 2 >> C: 2 >> >> Whether that kind of result would be more meaningful I am not really >> sure about. At least if one of the groups is really badly served, their >> associations and interest groups would have better evidence when they >> campaign for improvements. > > I always feel uncomfortable using "categories of people" as it risks > missing individuals with less frequent profiles (type of disability), or > forcing individuals into performa categories. Yes, I see what you mean. Maybe there are other ways to organise by type of impairment that don't equate impairment with a particular group of people, like a filter metaphor that would also do justice to multiple impairments. It also risks user > representations campaigning against each other rather than together. Is there evidence for this in the past? I doubt that a differentiated presentation would invite groups to campaign against each other. The criteria often are a common interest, sometimes complementary, and rarely the stuff of conflict. > > Best, > Shadi > > >> Regards, Detlev >> >>> Problem is, how to work out that percentage. I could use number of >>> violations/number of pages checked. However this does not weight the >>> more critical errors - like the ones you cited. I could work out >>> some kind of algorithm where violations of the critical issues were >>> say 1.5:1, items such as non-critical validation errors were .5:1 >>> or something similar. Thoughts? >>> >>> >>> Regards >>> >>> Vivienne L. Conway >>> ________________________________________ >>> From: public-wai-evaltf-request@w3.org >>> [public-wai-evaltf-request@w3.org] On Behalf Of fischer@dias.de >>> [fischer@dias.de] >>> Sent: Tuesday, 23 August 2011 11:04 PM >>> To: public-wai-evaltf@w3.org >>> Subject: RE: some initial questions from the previous thread >>> >>> Quoting Vivienne CONWAY<v.conway@ecu.edu.au>: >>> >>>> HI all >>>> Just thought I'd weigh in on this one as I'm currently puzzling over >>>> the issue of how to score websites. I'm just about to start a >>>> research project where I'll have over 100 websites assessed monthly >>>> over a period of 2 + years. >>> >>> If you will be doing this on your own or without team this work >>> programme translates to checking more than 4-5 sites per day! And if >>> the compliance level is AA you probably need to focus on some key >>> requirements, especially those where a failure would make a site >>> completely inaccessible to some population. Just looking at WCAG >>> success criteria, these may be the ones which most often exclude >>> people, ordered by importance from testing experience(feel free to >>> disagree): >>> >>> * Lack of keyboard accessibility (SC 2.1.1, 2.1.2) >>> * Important images like controls without alt text (1.1.1) >>> * CAPTCHAs w/o alternative (SC 1.1.1) >>> * Lack of captions in videos (SC 1.2.2, 1.2.4) >>> * Really low contrast of text (SC 1.4.3) >>> * Bad or no visibility of focus (SC 2.4.7) >>> * Important controls implemented as background image without text >>> replacement (SC 1.1.1) >>> * Important fields (such as search text input) w/o labels (SC 2.4.6) >>> * lack of structure (e.g. no or inconsistent headings) (SC 1.3.1) >>> * Self-starting / unstoppable animation, carussels, etc (SC 2.2.1, >>> 2.2.2) >>> >>> Well, having written this, it may seem a bit arbitrary - but I believe >>> the list has many or most of the grave errors that we encounter in >>> testing. >>> >>> If there was a statistic on "show stoppers" things that make sites >>> inaccessible or impede access severely, such an approach had a better >>> basis, of course... >>> >>> Just my 2 cents, >>> Detlev >>> >>> >>> ) that can be tested relatively quickly and without going onto too >>> much detail. >>> >>> I think as long as the method is transparent, / documented and its >>> limitations are clearly stated, the results can still be valuable. I >>> need to come up with a scoring method >>>> (preferably a percentage) due to the need to compare a website >>>> within those of its own classification (e.g. federal government, >>>> corporate, etc), and compare the different classifications. I am >>>> thinking of a method where the website gets a percentage score for >>>> each of the POUR principles, and then an overall score. What I'm >>>> strugling with is what scoring method to use and how to put >>>> different weights upon different aspects and at different levels. >>>> I'll be assessing to WCAG 2.0 AA (as that's the Australian >>>> standard). All input and suggestions are gratefully accepted and >>>> may also be useful to our discussions here as it's a real-life >>>> situation for me. It also relates to may of the questions raised in >>>> this thread by Shadi. Looking forward to some interesting discussion. >>>> >>>> >>>> Regards >>>> >>>> Vivienne L. Conway >>>> ________________________________________ >>>> From: public-wai-evaltf-request@w3.org >>>> [public-wai-evaltf-request@w3.org] On Behalf Of Shadi Abou-Zahra >>>> [shadi@w3.org] >>>> Sent: Monday, 22 August 2011 7:34 PM >>>> To: Eval TF >>>> Subject: some initial questions from the previous thread >>>> >>>> Dear Eval TF, >>>> >>>> From the recent thread on the construction of WCAG 2.0 Techniques, here >>>> are some questions to think about: >>>> >>>> * Is the "evaluation methodology" expected to be carried out by one >>>> person or by a group of more than one persons? >>>> >>>> * What is the expected level of expertise (in accessibility, in web >>>> technologies etc) of persons carrying out an evaluation? >>>> >>>> * Is the involvement of people with disabilities a necessary part of >>>> carrying out an evaluation versus an improvement of the quality? >>>> >>>> * Are the individual test results binary (ie pass/fail) or a score >>>> (discrete value, ratio, etc)? >>>> >>>> * How are these test results aggregated into an overall score (plain >>>> count, weighted count, heuristics, etc)? >>>> >>>> * Is it useful to have a "confidence score" for the tests (for example >>>> depending on the degree of subjectivity or "difficulty")? >>>> >>>> * Is it useful to have a "confidence score" for the aggregated result >>>> (depending on how the evaluation is carried out)? >>>> >>>> >>>> Feel free to chime in if you have particular thoughts on any of these. >>>> >>>> Best, >>>> Shadi >>>> >>>> -- >>>> Shadi Abou-Zahra - http://www.w3.org/People/shadi/ >>>> Activity Lead, W3C/WAI International Program Office >>>> Evaluation and Repair Tools Working Group (ERT WG) >>>> Research and Development Working Group (RDWG) >>>> >>>> This e-mail is confidential. If you are not the intended recipient >>>> you must not disclose or use the information contained within. If >>>> you have received it in error please return it to the sender via >>>> reply e-mail and delete any record of it from your system. The >>>> information contained within is not the opinion of Edith Cowan >>>> University in general and the University accepts no liability for >>>> the accuracy of the information provided. >>>> >>>> CRICOS IPC 00279B >>>> >>>> >>> >>> This e-mail is confidential. If you are not the intended recipient you >>> must not disclose or use the information contained within. If you have >>> received it in error please return it to the sender via reply e-mail >>> and delete any record of it from your system. The information >>> contained within is not the opinion of Edith Cowan University in >>> general and the University accepts no liability for the accuracy of >>> the information provided. >>> >>> CRICOS IPC 00279B >>> >> >> > -- --------------------------------------------------------------- Detlev Fischer PhD DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen Geschäftsführung: Thomas Lilienthal, Michael Zapp Telefon: +49-40-43 18 75-25 Mobile: +49-157 7-170 73 84 Fax: +49-40-43 18 75-19 E-Mail: fischer@dias.de Anschrift: Schulterblatt 36, D-20357 Hamburg Amtsgericht Hamburg HRB 58 167 Geschäftsführer: Thomas Lilienthal, Michael Zapp ---------------------------------------------------------------
Received on Wednesday, 24 August 2011 11:30:25 UTC