Re: Questions about the Silver scoring process

Rachael writes:

> Is the unit a word, sentence, div, paragraph, portion of the screen, etc?

Exactly! This is why scoping cannot be left to the author, it needs to be
defined in our spec.

I will assert that all tests need to be run on "the screen" (aka "a view")
to address what Rachael called 'non-interference', and/but that higher
level tests (*Accessibility in Practice* -- I am not a fan of *adjectival *as
a term-of-art -- in part because it does not make sense when you look at
the definition of that term in Merriam Webster
<https://www.merriam-webster.com/dictionary/adjectival>) will be a
collection of screens/views that comprise a task or path.

We need to define and evaluate both.

JF

On Tue, Jul 14, 2020 at 10:33 AM Rachael Bradley Montgomery <
rachael@accessiblecommunity.org> wrote:

> Hello,
>
> I've responded with my thoughts in line. If the answers are unclear,
> please let me know. Some of these do clarify the limits of this approach so
> thank you for calling these out.
>
> I appreciate the ongoing dialog as we have limited time in meetings. :)
>
> Rachael
>
> On Tue, Jul 14, 2020 at 11:03 AM Detlev Fischer <
> detlev.fischer@testkreis.de> wrote:
>
>> Hi all,
>> as there was not enough time to discuss the scoring process, I will
>> raise some questions here which I hope will clarify what is intended in
>> this draft version.
>>
>> Slide 9 of presentation linked to in Minutes
>> https://www.w3.org/2020/01/28-silver-minutes.html
>>
>> 1. Identify the components and associated views needed for users to
>> complete the path
>>
>> DF: If I understand this correctly, this means that if I have a path
>> that traverse 7 views (say, from 1-shopping cart to 2-specify billing
>> address to 3-specify shipping adress to 4-specify payment method to
>> 5-enter CC details to 6-review purchse details and confiorm - to
>> 7-confirmation of purchase) - all these views that are part of the path
>> are now lumped together and there is no fine-grained score on a
>> particular view withoin the path?
>>
>> RB: Each view is scored individually but all the scores are grouped
> together for purposes of the conformance scores.
>
>
>> 2. Run all level 1 tests for all views within a path
>>
>> DF: This would mean PASS/FAIL rating on each viewe o the path against
>> each 2.X SC - what is unclear is how the percentage comes in for less
>> than perfect views - say, when rating against 1.3.1, your payment
>> details form has one field where the label is not correctly referenced
>> (but some placeholder is there to make this less of a show stopper), the
>> others are fine - is that a subjective judgement? A quantitative
>> judgement? How do you determine wether 1.3.1 (or whatever that becomes)
>> is 90% met, 60% met (or any other figure)?
>>
>> RB:  A clarification: I think we need to see how a page would look in the
> current model and new model. I used SC as example "tests" in the template
> to let us cross reference the two models conceptually but they are an
> imperfect representation because they should be tests and right now, most
> SC include multiple tests. In the template, I included the current Pass,
> Fail, and Not Present so we could look at both approaches.
>
>  I originally started this approach with each test being a pass/fail.
> Having tried both testing approaches, testing pass/fail is much much
> easier.
>
> This does not allow for the % concept though unless we roll the pass/fails
> into %. So I tried this using an approach where test would be scored
> individually by %. The percent is the of passes divided by the number of
> instances in the view. This is pretty easy with links but hard to determine
> with tests like reflow or other content based tests. Is the unit a word,
> sentence, div, paragraph, portion of the screen, etc?
>
>
> 3. Note all failures on components needed to complete the path
>>
>> DF: Whether something counts as a failure is often not easy to
>> determine. Note that 1.3.1 despite its huge scope knows only two
>> Failures. So there is significant subjectivity in determining whether,
>> say,  a missing programmatic link of a label while a placeholder
>> provides a less-than-perfect hint at the content required for the field
>> should be registered as a FAIL of 1.3.1 (or whatever) - and that
>> situation is pervasive in actual testing.
>>
>
> RB: In my opinion, for this to work, the tests need to be as granular as
> possible, preferably with clearly stated passing and failing criteria.
> There should also be a clear relationship to only one functional outcome.
> This will result in a large number of tests. Each test would need to
> reference which technologies it applied to. Tests that do not apply, are
> not counted as part of the average.
>
>
>>
>> 4. Note the % tests passed for each view (total passed/total in view)
>>
>> DF: So here we have some granularity for parts of the path? And an
>> aggregate value? One issue tackled in complete processes is that
>> aggregation can be misleading: if one part of a path fails completely,
>> the rest can be accessible but user time is wasted just as much (or
>> worse) than if the entire thing was inaccessible
>>
>
> RB: The approach of addressing both the component level path and the views
> was trying to address this. Perhaps it doesn't?
>
>>
>> 5. Note tests that are not applicable
>>
>> DF: I don't understand that.
>>
>
> RB: Some tests won't apply such as captions when no media is present.
> Testing would note that these are not applicable within this path. These
> would then be skipped for purposes of scoring.
>
>
>>
>> 6. Average all the tests for a guideline for an overall %
>>
>> DF: I take it that this is the averags across all component views of a
>> path? See caveat above...
>>
>> Yes.
>
>
>> 7. Score each guideline based on % of tests passed
>> 100% - 3
>> 75-99% - 2
>> 50-74% - 1
>> 0-50% - 0
>>
>> 8. Average the score of all guidelines to a single decimal point
>> If average score = 3, run level 2a and/or 2b tests
>>
>> DF: So you would only proceed with running the 'softer' tests if the
>> 'harder level 1 tests are perfect (100%)? I don't think this is
>> intended...
>>
>> For day to day testing, all three types of tests should be addressed but
> for a conformance claim, that is what this intended.  Since there is some
> rounding, there is some room for imperfections but not a lot. I will add a
> note that this needs to be explored more. There are other ways to balance
> this but the risk of running the higher level tests is that it would add
> bias towards one disability over another. For example if usability tests
> within a guideline that supports cognitive disabilities up that to a 4 or 5
> but the guidelines that support visual disabilities was still at a 1, the
> overall score would look more accessible while being inaccessible for
> screen reader users.
>
>
>> If 90% or greater of level 2a or 2b tests pass, increase the guideline
>> score to a 4
>> If 90% or greater of both 2a and 2b tests pass, increase the guideline
>> score to a 5
>>
>> DF: Depending on the answer above (does this only happen when 100% - 3,
>> which will be a rare outcome) the question is whether any of the
>> failures will prevent further tests on level 2a / 2b?
>>
>> Calculate overall and functional category scores
>>
>> DF: Not clear to me at the moment..
>>
>> Overall = average of all guideline scores
>> Each functional category = average of related guideline scores
>>
>
>
>> --
>> Detlev Fischer
>> DIAS GmbH
>> (Testkreis is now part of DIAS GmbH)
>>
>> Mobil +49 (0)157 57 57 57 45
>>
>> http://www.dias.de
>> Beratung, Tests und Schulungen für barrierefreie Websites
>>
>>
>>
>
> --
> Rachael Montgomery, PhD
> Director, Accessible Community
> rachael@accessiblecommunity.org
>
> "I will paint this day with laughter;
> I will frame this night in song."
>  - Og Mandino
>
>

-- 
*​John Foliot* | Principal Accessibility Strategist | W3C AC Representative
Deque Systems - Accessibility for Good
deque.com
"I made this so long because I did not have time to make it shorter." -
Pascal "links go places, buttons do things"

Received on Tuesday, 14 July 2020 15:55:40 UTC