- From: John Foliot <john.foliot@deque.com>
- Date: Tue, 14 Jul 2020 11:04:31 -0500
- To: Rachael Bradley Montgomery <rachael@accessiblecommunity.org>
- Cc: Detlev Fischer <detlev.fischer@testkreis.de>, Silver TF <public-silver@w3.org>
- Message-ID: <CAKdCpxzxSFcftNVWFDZ3eVOdbN1oX=+DfSs=HtUKnPMUqA14ww@mail.gmail.com>
Rachael also writes: > These would then be skipped for purposes of scoring. Skipped, or given full credit? In your use-case, *IF* a page has an accessible media, it could garner *more* points than if the page never had a video, and I don't see how that benefits anyone - it is a potential (if complicated) way of gaming the system: add an accessible media component to the screen to gain points lost because my screen content doesn't reflow... I know we've discussed 'additive' scores previously, but have we fully evaluated 'subtractive' scoring as well? It would certainly address this use case (i.e. a screen with a fully accessible media experience has the same score as a screen with no media, but "loses" points if the media experience is less than 100%) Thoughts? JF On Tue, Jul 14, 2020 at 10:54 AM John Foliot <john.foliot@deque.com> wrote: > Rachael writes: > > > Is the unit a word, sentence, div, paragraph, portion of the screen, etc? > > Exactly! This is why scoping cannot be left to the author, it needs to be > defined in our spec. > > I will assert that all tests need to be run on "the screen" (aka "a view") > to address what Rachael called 'non-interference', and/but that higher > level tests (*Accessibility in Practice* -- I am not a fan of *adjectival > *as a term-of-art -- in part because it does not make sense when you look > at the definition of that term in Merriam Webster > <https://www.merriam-webster.com/dictionary/adjectival>) will be a > collection of screens/views that comprise a task or path. > > We need to define and evaluate both. > > JF > > On Tue, Jul 14, 2020 at 10:33 AM Rachael Bradley Montgomery < > rachael@accessiblecommunity.org> wrote: > >> Hello, >> >> I've responded with my thoughts in line. If the answers are unclear, >> please let me know. Some of these do clarify the limits of this approach so >> thank you for calling these out. >> >> I appreciate the ongoing dialog as we have limited time in meetings. :) >> >> Rachael >> >> On Tue, Jul 14, 2020 at 11:03 AM Detlev Fischer < >> detlev.fischer@testkreis.de> wrote: >> >>> Hi all, >>> as there was not enough time to discuss the scoring process, I will >>> raise some questions here which I hope will clarify what is intended in >>> this draft version. >>> >>> Slide 9 of presentation linked to in Minutes >>> https://www.w3.org/2020/01/28-silver-minutes.html >>> >>> 1. Identify the components and associated views needed for users to >>> complete the path >>> >>> DF: If I understand this correctly, this means that if I have a path >>> that traverse 7 views (say, from 1-shopping cart to 2-specify billing >>> address to 3-specify shipping adress to 4-specify payment method to >>> 5-enter CC details to 6-review purchse details and confiorm - to >>> 7-confirmation of purchase) - all these views that are part of the path >>> are now lumped together and there is no fine-grained score on a >>> particular view withoin the path? >>> >>> RB: Each view is scored individually but all the scores are grouped >> together for purposes of the conformance scores. >> >> >>> 2. Run all level 1 tests for all views within a path >>> >>> DF: This would mean PASS/FAIL rating on each viewe o the path against >>> each 2.X SC - what is unclear is how the percentage comes in for less >>> than perfect views - say, when rating against 1.3.1, your payment >>> details form has one field where the label is not correctly referenced >>> (but some placeholder is there to make this less of a show stopper), the >>> others are fine - is that a subjective judgement? A quantitative >>> judgement? How do you determine wether 1.3.1 (or whatever that becomes) >>> is 90% met, 60% met (or any other figure)? >>> >>> RB: A clarification: I think we need to see how a page would look in >> the current model and new model. I used SC as example "tests" in the >> template to let us cross reference the two models conceptually but they are >> an imperfect representation because they should be tests and right now, >> most SC include multiple tests. In the template, I included the current >> Pass, Fail, and Not Present so we could look at both approaches. >> >> I originally started this approach with each test being a pass/fail. >> Having tried both testing approaches, testing pass/fail is much much >> easier. >> >> This does not allow for the % concept though unless we roll the >> pass/fails into %. So I tried this using an approach where test would be >> scored individually by %. The percent is the of passes divided by the >> number of instances in the view. This is pretty easy with links but hard to >> determine with tests like reflow or other content based tests. Is the unit >> a word, sentence, div, paragraph, portion of the screen, etc? >> >> >> 3. Note all failures on components needed to complete the path >>> >>> DF: Whether something counts as a failure is often not easy to >>> determine. Note that 1.3.1 despite its huge scope knows only two >>> Failures. So there is significant subjectivity in determining whether, >>> say, a missing programmatic link of a label while a placeholder >>> provides a less-than-perfect hint at the content required for the field >>> should be registered as a FAIL of 1.3.1 (or whatever) - and that >>> situation is pervasive in actual testing. >>> >> >> RB: In my opinion, for this to work, the tests need to be as granular as >> possible, preferably with clearly stated passing and failing criteria. >> There should also be a clear relationship to only one functional outcome. >> This will result in a large number of tests. Each test would need to >> reference which technologies it applied to. Tests that do not apply, are >> not counted as part of the average. >> >> >>> >>> 4. Note the % tests passed for each view (total passed/total in view) >>> >>> DF: So here we have some granularity for parts of the path? And an >>> aggregate value? One issue tackled in complete processes is that >>> aggregation can be misleading: if one part of a path fails completely, >>> the rest can be accessible but user time is wasted just as much (or >>> worse) than if the entire thing was inaccessible >>> >> >> RB: The approach of addressing both the component level path and the >> views was trying to address this. Perhaps it doesn't? >> >>> >>> 5. Note tests that are not applicable >>> >>> DF: I don't understand that. >>> >> >> RB: Some tests won't apply such as captions when no media is present. >> Testing would note that these are not applicable within this path. These >> would then be skipped for purposes of scoring. >> >> >>> >>> 6. Average all the tests for a guideline for an overall % >>> >>> DF: I take it that this is the averags across all component views of a >>> path? See caveat above... >>> >>> Yes. >> >> >>> 7. Score each guideline based on % of tests passed >>> 100% - 3 >>> 75-99% - 2 >>> 50-74% - 1 >>> 0-50% - 0 >>> >>> 8. Average the score of all guidelines to a single decimal point >>> If average score = 3, run level 2a and/or 2b tests >>> >>> DF: So you would only proceed with running the 'softer' tests if the >>> 'harder level 1 tests are perfect (100%)? I don't think this is >>> intended... >>> >>> For day to day testing, all three types of tests should be addressed but >> for a conformance claim, that is what this intended. Since there is some >> rounding, there is some room for imperfections but not a lot. I will add a >> note that this needs to be explored more. There are other ways to balance >> this but the risk of running the higher level tests is that it would add >> bias towards one disability over another. For example if usability tests >> within a guideline that supports cognitive disabilities up that to a 4 or 5 >> but the guidelines that support visual disabilities was still at a 1, the >> overall score would look more accessible while being inaccessible for >> screen reader users. >> >> >>> If 90% or greater of level 2a or 2b tests pass, increase the guideline >>> score to a 4 >>> If 90% or greater of both 2a and 2b tests pass, increase the guideline >>> score to a 5 >>> >>> DF: Depending on the answer above (does this only happen when 100% - 3, >>> which will be a rare outcome) the question is whether any of the >>> failures will prevent further tests on level 2a / 2b? >>> >>> Calculate overall and functional category scores >>> >>> DF: Not clear to me at the moment.. >>> >>> Overall = average of all guideline scores >>> Each functional category = average of related guideline scores >>> >> >> >>> -- >>> Detlev Fischer >>> DIAS GmbH >>> (Testkreis is now part of DIAS GmbH) >>> >>> Mobil +49 (0)157 57 57 57 45 >>> >>> http://www.dias.de >>> Beratung, Tests und Schulungen für barrierefreie Websites >>> >>> >>> >> >> -- >> Rachael Montgomery, PhD >> Director, Accessible Community >> rachael@accessiblecommunity.org >> >> "I will paint this day with laughter; >> I will frame this night in song." >> - Og Mandino >> >> > > -- > *John Foliot* | Principal Accessibility Strategist | W3C AC > Representative > Deque Systems - Accessibility for Good > deque.com > "I made this so long because I did not have time to make it shorter." - > Pascal "links go places, buttons do things" > > > > -- *John Foliot* | Principal Accessibility Strategist | W3C AC Representative Deque Systems - Accessibility for Good deque.com "I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things"
Received on Tuesday, 14 July 2020 16:05:22 UTC