Questions about the Silver scoring process from Detlev Fischer on 2020-07-14 (public-silver@w3.org from July 2020)

From: Detlev Fischer <detlev.fischer@testkreis.de>
Date: Tue, 14 Jul 2020 17:02:16 +0200
To: public-silver@w3.org
Message-ID: <4bfaae1a-8336-0fe7-4920-3547e3b42d32@testkreis.de>
Hi all,
as there was not enough time to discuss the scoring process, I will 
raise some questions here which I hope will clarify what is intended in 
this draft version.

Slide 9 of presentation linked to in Minutes
https://www.w3.org/2020/01/28-silver-minutes.html

1. Identify the components and associated views needed for users to 
complete the path

DF: If I understand this correctly, this means that if I have a path 
that traverse 7 views (say, from 1-shopping cart to 2-specify billing 
address to 3-specify shipping adress to 4-specify payment method to 
5-enter CC details to 6-review purchse details and confiorm - to 
7-confirmation of purchase) - all these views that are part of the path 
are now lumped together and there is no fine-grained score on a 
particular view withoin the path?

2. Run all level 1 tests for all views within a path

DF: This would mean PASS/FAIL rating on each viewe o the path against 
each 2.X SC - what is unclear is how the percentage comes in for less 
than perfect views - say, when rating against 1.3.1, your payment 
details form has one field where the label is not correctly referenced 
(but some placeholder is there to make this less of a show stopper), the 
others are fine - is that a subjective judgement? A quantitative 
judgement? How do you determine wether 1.3.1 (or whatever that becomes) 
is 90% met, 60% met (or any other figure)?

3. Note all failures on components needed to complete the path

DF: Whether something counts as a failure is often not easy to 
determine. Note that 1.3.1 despite its huge scope knows only two 
Failures. So there is significant subjectivity in determining whether, 
say,  a missing programmatic link of a label while a placeholder 
provides a less-than-perfect hint at the content required for the field 
should be registered as a FAIL of 1.3.1 (or whatever) - and that 
situation is pervasive in actual testing.

4. Note the % tests passed for each view (total passed/total in view)

DF: So here we have some granularity for parts of the path? And an 
aggregate value? One issue tackled in complete processes is that 
aggregation can be misleading: if one part of a path fails completely, 
the rest can be accessible but user time is wasted just as much (or 
worse) than if the entire thing was inaccessible

5. Note tests that are not applicable

DF: I don't understand that.

6. Average all the tests for a guideline for an overall %

DF: I take it that this is the averags across all component views of a 
path? See caveat above...

7. Score each guideline based on % of tests passed
100% - 3
75-99% - 2
50-74% - 1
0-50% - 0

8. Average the score of all guidelines to a single decimal point
If average score = 3, run level 2a and/or 2b tests

DF: So you would only proceed with running the 'softer' tests if the 
'harder level 1 tests are perfect (100%)? I don't think this is intended...

If 90% or greater of level 2a or 2b tests pass, increase the guideline 
score to a 4
If 90% or greater of both 2a and 2b tests pass, increase the guideline 
score to a 5

DF: Depending on the answer above (does this only happen when 100% - 3, 
which will be a rare outcome) the question is whether any of the 
failures will prevent further tests on level 2a / 2b?

Calculate overall and functional category scores

DF: Not clear to me at the moment..

Overall = average of all guideline scores
Each functional category = average of related guideline scores


-- 
Detlev Fischer
DIAS GmbH
(Testkreis is now part of DIAS GmbH)

Mobil +49 (0)157 57 57 57 45

http://www.dias.de
Beratung, Tests und Schulungen für barrierefreie Websites
Received on Tuesday, 14 July 2020 15:02:29 UTC