- From: Gérard Talbot <css21testsuite@gtalbot.org>
- Date: Tue, 21 Sep 2010 13:32:34 -0700
- To: "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
- Cc: "John Jansen" <John.Jansen@microsoft.com>
Hello John, > Please note, that I ran the entire suite for the first time last summer Last summer? You mean this summer 2010.. or summer 2009? There is a difference of several thousands of testcases if we're talking of summer 2009 versus summer 2010 here. > and it took me three days of interrupted time (NOT non-interrupted time). 3 days to run how many testcases? How many seconds (avg) per testcase? > I just now ran 20 tests from the HTML suite and it took me 24 > seconds. I have huge difficulties understanding how you can run 20 testcases manually in 24 seconds. In order to run, say, 20 testcases, you need to do at minimum 39 (2n -1) mouseclicks. The testcases are not arranged, not coded with <link rel="next">. So you have to click the back button to get back to the list of testcases to click the link of next testcase. And you have to read the pass/fail conditions too of each testcase. No testcase is http-prefetchable (coded with <link rel="prefetch">). There are many testcases which require to compare with a reference test. So, at least, 2 extra clicks. > I am not saying this is typical, necessarily, and when you hit a failure > it certainly adds time, but I think that looking at an 11 second average > seems very high in practice. > -John I took the test harness in January 2010 (511 tests) and I mentioned this in http://lists.w3.org/Archives/Public/public-css-testsuite/2010Jan/0043.html and my results (I was using Konqueror 4.x) are still available, accessible, viewable here: http://www.w3.org/2008/07/test-harness-css/results.php?s=htm4&o=0 and it took me 4 hours to run the 511 tests. There are other testcases situations which will slow down testers - a bunch of testcases require to download and install a custom font and then to uninstall it - a bunch of testcases require to download and install an user style sheet - a bunch of testcases require to read more than 1 sentence - a bunch of testcases have small or very small lines, squares as pass/fail conditions - a bunch of testcases have awkward wording of pass/fail conditions or inappropriate shape description of expected result (causing confusion, hesitation) - a good bunch of testcases require to compare width or height of 2 squares. If quality (over speed) of testing is more important, if testers have more than a "It's good enough" sense of quality/QA policy, then they may report a few more FAILED tests after stopping+spending a few more seconds. E.g. http://test.csswg.org/suites/css2.1/20100917/html4/html-attribute-019.htm "there is no space between the green and blue boxes" is not the same as the green square partially *overlapping* the blue square. ============== "I do not think it is worth it to try rushing to REC while the test suite is in the state it is in." Anne van Kesteren I very much agree with Anne van Kesteren's opinion here. ------- There are wrong testcases in the test suite; not many... hopefully. E.g: http://test.csswg.org/suites/css2.1/20100917/html4/position-relative-nested-001.htm ------- There are false positive testcases: http://test.csswg.org/suites/css2.1/20100917/html4/padding-right-applies-to-013.htm is a wrong testcase which all testers (regardless of browser actually testing) would/will report as a PASSED test. ------- Some are false negative testcases: http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-115.htm http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-116.htm ------- Some are inaccurately coded testcases. E.g. a few (many?) *-applies-to-010 (involving 'display: list-item'). If the tester does not see a bullet list-marker, then the testcase should be marked as FAILED. The thing is that there are still testcases which do not say that a bullet list marker should be visible and are still inappropriately coded which makes them hidden (outside the viewport). E.g. http://test.csswg.org/suites/css2.1/20100917/html4/padding-top-applies-to-010.htm ------ Some testcases are not robust testcases or stringent testcases: eg http://test.csswg.org/suites/css2.1/20100917/html4/right-offset-percentage-001.htm If you change (or remove) 'position: absolute' and make it static, the testcase still passes. If you change 'right: 50%' to 'right: auto', the testcase still passes anyway. ------ A good bunch of Microsoft submitted testcases have unnecessary (or unjustified or unneeded) declarations (eg height: 0; border-collapse: collapse; dir: rtl; position: absolute;) or extraneous div containers. It is not a reason to reject them but... it is a reason to believe that such testcases are not best and that the test suite could be improved. ------ Some are not very relevant testcases: if a testcase is passed when CSS support is disabled, then such testcase's relevance is rather limited, otherwise questionable. Ideally, you would want all testcases to fail when using Lynx 2.8.5 or NS3 or a non-capable CSS browser. ------ Some sections are under-tested (e.g. several sub-sections of section 10.3) while some others are IMO over-tested. Did you know that there are over 600 testcases testing the zero value (+-signed and unsigned; for the 9 different units; for many properties). --------- My conclusion is that automatable testing, while definitely preferable over manual testing, will not do much if the testcases are not reviewed, checked, corrected or adjusted accordingly to begin with. You first want to have reliable, trustworthy, accurately designed testcases before creating reftests or labelling correspondent screenshots. regards, Gérard -- Contributions to the CSS 2.1 test suite: http://www.gtalbot.org/BrowserBugsSection/css21testsuite/ CSS 2.1 test suite (RC1; September 17th 2010): http://test.csswg.org/suites/css2.1/20100917/html4/toc.html CSS 2.1 test suite contributors: http://test.csswg.org/source/contributors/
Received on Tuesday, 21 September 2010 20:33:11 UTC