Re: Proposal: severity axis on test result from Ian Hickson on 2002-07-01 (w3c-wai-er-ig@w3.org from July 2002)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 01 Jul 2002 11:45:01 +0100
To: Charles McCathieNevile <charles@w3.org>
Cc: w3c-wai-er-ig <w3c-wai-er-ig@w3.org>
Message-ID: <3D2032AD.802@hixie.ch>
Charles McCathieNevile wrote:
> Hmm. This was similar to somethingthe UAAG group wanted.
> 
> I would propose that we have qualitative rather than quantitative ratings.

Why? I personally would much rather both "confidence" and "severity" were 
changed to use percentages (or some other scale, e.g. 0.0 .. 1.0). Having an 
enumerated set of values artificially limits applications. For example, I need 
four severities (100%, 90%, 50%, 0%) and two confidence levels (100%, 0%), 
whereas the current spec only allows for 1 severity (100%) and three confidence 
levels (~100%, ~66%, ~33%).

Using a numeric scale doesn't reduce interoperability, as applications can 
simply "pigeon hole" values into their internal enumerated types. (The problem 
of round tripping through systems that use enumerated sets is already present, 
since it is most likely that applications will not have exactly matching sets of 
severities and confidences.)

For example, I intend to map any Pass values with severity 80%-99% into the 
"pass with unrelated errors (Yb)" category when summarising results.

(Note: Internally, I store severities and confidences as integers from 0 to 255. 
That is the natural computer equivalent of percentages. I would be quite happy 
if EARL used the integer scale 0..255. I would also be fine with EARL using a 
floating point scale between two arbitrary values, such as 0.0 and 1.0.)


> For some use cases your Yb - pass with unrelated errors would count as a pass,
> and for some cases it would score as a fail. So we would need to know what
> they are.

Assuming "they" refers to the "unrelated errors" then yes, you would; that's 
what the "comments" field is for, presumably.


> The question also arises as to how many kinds of result we should include in
> earl and at what point we should leave people to subclass them for their own
> more detailed uses.

I think you only need:

    Pass
    Fail
    Not Applicable
    Not Tested

All the other values I can think of are simply variants of those four result 
types with various values for "Severity" and "Confidence".

You don't need "Can't Tell" as that should just be "Pass" or "Fail" with 
"Confidence: 0%". (Whether you pick "pass" or "fail" depends on which is the 
"default" -- e.g. in a test where supporting the feature correctly and not even 
trying to support the feature are indistinguishable, you would use "Pass", while 
in a test where trying to support the feature but doing so _incorrectly_ is 
indistinguishable from not supporting the feature at all you would use "Fail".)

Note that "Not Tested" is present only for completeness, as I expect most 
applications would simply not include the result in that case.

"Not Applicable" is important for tests that neither pass nor fail, such as a 
test for making sure all images have alternate text, when applied to a document 
with no images, or a test to make sure that 'red' is rendered different from 
'green', on a monochromatic device.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 1 July 2002 06:45:04 UTC