- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 03 Jul 2002 13:30:51 +0100
- To: Charles McCathieNevile <charles@w3.org>
- Cc: w3c-wai-er-ig <w3c-wai-er-ig@w3.org>
Charles McCathieNevile wrote: > [numeric value vs enumerated value for confidence and severities] > > You will have an internal mapping of numeric results that is > specific to your needs, and the next person will too, but theirs > will be slightly different. If you map both of those to the same > EARL scheme by numeric value, you lose the ability to compare > between them without doing complicated searching based on who said > what and then doing the mapping I'm not convinced you have that ability now. If Alice has two confidence levels ('confident' and 'unsure') and Bob has two confidence levels ('1' and '2'), and Alice maps 'confident' to EARL's 'high' confidence and 'unsure' to EARL's 'low' confidence, while Bob maps '1' to 'high' and '2' to 'medium', then your data is just as useless as if they had mapped their values to four different numbers. I don't think EARL having enumerated types will make comparing merged results from people who did not agree on a convention before hand any easier. Similarly, if David has a numeric confidence scale, when he maps it to EARL's three levels, he's going to actually lose detail. I think it is important that you not _lose_ data when exporting to EARL. IMHO, you should be able to round trip data from a system, to EARL, and back again, without losing information. (The opposite is not true; there is no guarentee that arbitrary EARL should be able to be round tripped through an arbitrary EARL-aware system, because that depends entirely on that system's capabilities.) > you don't get essentially meaningless EARL data like "this is 66% > conformant to a checkpoint that may or may not have a definition of > what that means" "this is 'low' conformant to a checkpoint that may or may not have a definition of what that means" I don't see how enumerated types in any way solve that problem. >>> For some use cases your Yb - pass with unrelated errors would >>> count as a pass, and for some cases it would score as a fail. So >>> we would need to know what they are. > > We need to know what the use case is, and what the errors are, to > categorise whether a result means "only failed because of > dependencies" or "passed the essential bit". Oh, right. Ok. This test: http://www.hixie.ch/tests/adhoc/css/inheritance/content/002.xml ...would get a Y if it read: (Control: PASSED) Test 1 has: PASSED Test 2 has: PASSED However, if it said that but in addition some of the lines overlapped each other, then the test would have passed with unrelated errors, and would get a Yb (for "Yes with bugs"). (PASS, Confidence: 100%, Severity: 90%) However, if it read (Control: PASSED) Test 1 has: Test 2 has: ...as it does in recent Mozilla builds, then the test would get a B (for "Buggy") (FAIL, Confidence: 100%, Severity: 50%). Does that answer your question? > But maybe this isn't necessary if we just say "use a more atomic > test". I don't know if that works in the real world or not. Atomic testing is not the only kind of testing required. It is absolutely imperative that browsers be tested with very complicated tests, as many bugs only become apparent when two or more features interact. If EARL is to be useful in this scenario, it has to deal with non-atomic tests. For example, the Box Acid Test: http://www.w3.org/Style/CSS/Test/CSS1/current/test5526c.htm If this test looks like the screenshot, then in my nomenclature the UA gets a Y ("Yes") for this test. If it looks like the screenshot but the radio buttons don't work (e.g. think they are part of the same group) then it gets a Yb ("Yes, with minor unrelated bugs"). If the page is mostly correct but the <body> isn't sized correctly, it gets a B ("Buggy") and if the page doesn't look right, it gets a D ("Destroys this feature", since if the page looks wrong it means the UA doesn't support the block box model, a critical part of CSS1). This is not an atomic test; it's one of the most complicated CSS1 tests on the W3C CSS1 test suite. And it is by far not the most complicated CSS test around. >> You don't need "Can't Tell" as that should just be "Pass" or "Fail" >> with "Confidence: 0%". (Whether you pick "pass" or "fail" depends >> on which is the "default" -- e.g. in a test where supporting the >> feature correctly and not even trying to support the feature are >> indistinguishable, you would use "Pass", while in a test where >> trying to support the feature but doing so _incorrectly_ is >> indistinguishable from not supporting the feature at all you would >> use "Fail".) > Not sure that I agree here. Having a pass and a fail be equivalent > under circumstances that are outside the RDF model we are using > seems likely to lead to problems in processing. They aren't equivalent. Pass with 0% confidence is a good result (it means the UA could in fact have completely passed the test) whereas Fail with 0% confidence is a bad result (the test definitely failed, it's just unclear whether it was because of lack of support or because of a bug). > If there is a cannotTell then I think it should be a single defined > property. More to the point, what is the use case for it in the > first place? This test: http://www.hixie.ch/tests/evil/css/import/extra/linklanguage.html ...looks exactly the same in UAs that support CSS completely, and those that have zero CSS support. Therefore you can't tell whether or not it passed, only whether it failed. So if the test failed, it gets a B (FAIL, Confidence: 100%, Severity: 50%) but if it passes then it gets an M (for "Maybe", in EARL terms that would be a PASS, Confidence: 0%, Severity: 100%). If you were looking for whether the UA passed a checkpoint, which had that test as a prerequisite, then you would say the checkpoint had passed if that test was marked PASS, even with Confidence: 0%. On the other hand, if you had a test with FAIL, Confidence: 0%, then you would know that the checkpoint had not been passed, you just wouldn't know for sure whether it had failed because of intentional lack of support or whether it had failed because of a bug. >> "Not Applicable" is important for tests that neither pass nor fail, such as a >> test for making sure all images have alternate text, when applied to a document >> with no images, or a test to make sure that 'red' is rendered different from >> 'green', on a monochromatic device. > > (CMN I agree with your conclusion, but testing whether 'green' and 'red' are > rendered differently on a monochromatic device is in fact important...) ...or a test for whether the 'color' property is correctly parsed, in a UA that doesn't support CSS, then. :-) -- Ian Hickson )\._.,--....,'``. fL "meow" /, _.. \ _\ ;`._ ,. http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 3 July 2002 08:40:08 UTC