- From: Gérard Talbot <css21testsuite@gtalbot.org>
- Date: Wed, 22 Sep 2010 08:24:53 -0700
- To: "John Jansen" <John.Jansen@microsoft.com>
- Cc: "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
>> -----Original Message----- >> From: "Gérard Talbot" [mailto:css21testsuite@gtalbot.org] >> Sent: Tuesday, September 21, 2010 1:33 PM >> To: public-css-testsuite@w3.org >> Cc: John Jansen >> Subject: RE: Conversion of MS CSS 2.1 tests to reftests >> Hello John, >> > Please note, that I ran the entire suite for the first time last >> summer >> Last summer? You mean this summer 2010.. or summer 2009? There is a difference of several thousands of testcases if we're talking of summer >> 2009 versus summer 2010 here. > Summer 2010 using an IE9 build. >> > and it took me three days of interrupted time (NOT non-interrupted >> time). >> 3 days to run how many testcases? How many seconds (avg) per testcase? > Well, ~9500. The tests that are there. I think that works out to just under 8 seconds a test overall if we say I worked 7 hours a day (seems about right: Thursday, Friday, Saturday). >> > I just now ran 20 tests from the HTML suite and it took me 24 >> seconds. >> I have huge difficulties understanding how you can run 20 testcases manually >> in 24 seconds. > I'm not responding to this comment, as I am not lying. >> In order to run, say, 20 testcases, you need to do at minimum 39 (2n -1) >> mouseclicks. The testcases are not arranged, not coded with <link rel="next">. So you have to click the back button to get back to the list of >> testcases to click the link of next testcase. And you have to read the >> pass/fail >> conditions too of each testcase. > I have no idea why you need to browse back to the test case when you can > easily download the zipped files and run them, or build a simple harness > in script to help you. You could write an htm file that is generated by > script writing a for-each file in the unzipped folder. I personally just > had two monitors up and loaded the tests from the folder on one monitor > into the browser on the other. >> No testcase is http-prefetchable (coded with <link rel="prefetch">). There are many testcases which require to compare with a reference test. >> So, at least, 2 extra clicks. > None of the 20 I ran needed me to do any additional clicks. >> > I am not saying this is typical, necessarily, and when you hit a >> failure >> > it certainly adds time, but I think that looking at an 11 second >> average >> > seems very high in practice. >> > -John >> I took the test harness in January 2010 (511 tests) and I mentioned this in >> http://lists.w3.org/Archives/Public/public-css-testsuite/2010Jan/0043.html and my results (I was using Konqueror 4.x) are still available, accessible, >> viewable here: >> http://www.w3.org/2008/07/test-harness-css/results.php?s=htm4&o=0 and it took me 4 hours to run the 511 tests. > That is very surprising. I suspect you were evaluating each test for accuracy as you ran them, Hello John, I must have checked/examined a few, yes. I know for a fact that I could not say either pass or fail in 30 testcases. Those 30 testcases are each/all identified+identifiable in http://www.w3.org/2008/07/test-harness-css/results.php?s=htm4&o=0 (as I was using Konqueror) and I listed the font ones in http://lists.w3.org/Archives/Public/public-css-testsuite/2010Jan/0043.html > rather than simply logging a pass/fail. The logging of a pass/fail result was done with the buttons at the bottom of the test harness page. The pass/fail results were logged and are in that aforementioned page http://www.w3.org/2008/07/test-harness-css/results.php?s=htm4&o=0 > You > are saying on average 28 seconds per test; I have huge difficulties understanding that number. >> There are other testcases situations which will slow down testers - a bunch of testcases require to download and install a custom font and then >> to uninstall it > Yep, did it. >> - a bunch of testcases require to download and install an user style sheet >> - a bunch of testcases require to read more than 1 sentence >> - a bunch of testcases have small or very small lines, squares as pass/fail >> conditions > Yes, they do. >> - a bunch of testcases have awkward wording of pass/fail conditions or >> inappropriate shape description of expected result (causing confusion, >> hesitation) >> - a good bunch of testcases require to compare width or height of 2 squares. >> If quality (over speed) of testing is more important, if testers have more than >> a "It's good enough" sense of quality/QA policy, then they may report a few >> more FAILED tests after stopping+spending a few more seconds. E.g. http://test.csswg.org/suites/css2.1/20100917/html4/html-attribute-019.htm "there is no space between the green and blue boxes" is not the same as the >> green square partially *overlapping* the blue square. > Yep, I actually had a tri-state approach: Pass/Fail/???. I went through > the whole suite. After I was done, I went back to the harder ones to evaluate, took my time with them, and for any final questions, I met with Arron to discuss. >> ============== >> "I do not think it is worth it to try rushing to REC while the test suite is in the >> state it is in." >> Anne van Kesteren >> I very much agree with Anne van Kesteren's opinion here. > I'm not a fan of rushing to REC either. I have no idea why it's > September 21st and it seems like very few people have been running the tests that have been up there for months if not years. The first inclusion of testcases from Microsoft was, according to IE blog, on March 6th 2008 and it was a batch of 700 testcases. The biggest batch was 3784 testcases on January 27th 2009 according to IE blog. So, many months: yes. Many years: I would not say so. > We have had a > plan in place, we discussed in January, at the spring F2F, and then got > concrete agreement in Oslo. October 15th was the agreed upon date. <shrug> I can not speak about this spring Oslo F2F meeting-agreement. >> ------- >> There are wrong testcases in the test suite; not many... hopefully. E.g: >> http://test.csswg.org/suites/css2.1/20100917/html4/position-relative- nested-001.htm > Yep, those issues will be revealed as people continue to review the suite, and should be raised as issues. Like any process for locking down, you evaluate the incoming feedback as it comes. Locking down means > reducing churn. I think we all want to lock down 2.1, and doing so requires Implementation Reports against the test suite. The false Failed will be detected, discussed and addressed/fixed rather soon IMO: an excellent example of this is David Baron's first 6 emails wrt specific testcases. His opinion was that the 6 testcases mentioned in his emails were false Failed. The false Passed or the wrong testcases are quite different. If I may use such comparison/analogy, it will be rather easy and/or fast to point out the oranges in this big bag of apples but it will be considerably more difficult to corner/isolate/identify the bad apples, the apples with a worm inside, the partially rotten apples. You'll need to taste them a bit or dissect them a bit. >> ------- >> There are false positive testcases: >> http://test.csswg.org/suites/css2.1/20100917/html4/padding-right-applies- to-013.htm >> is a wrong testcase which all testers (regardless of browser actually testing) would/will report as a PASSED test. >> ------- >> Some are false negative testcases: >> http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-115.htm http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-116.htm ------- >> Some are inaccurately coded testcases. E.g. >> a few (many?) *-applies-to-010 (involving 'display: list-item'). If the tester >> does not see a bullet list-marker, then the testcase should be marked as >> FAILED. The thing is that there are still testcases which do not say that a bullet >> list marker should be visible and are still inappropriately coded which makes >> them hidden (outside the viewport). >> E.g. >> http://test.csswg.org/suites/css2.1/20100917/html4/padding-top-applies-to- 010.htm >> ------ >> Some testcases are not robust testcases or stringent testcases: eg http://test.csswg.org/suites/css2.1/20100917/html4/right-offset- percentage-001.htm >> If you change (or remove) 'position: absolute' and make it static, the >> testcase >> still passes. If you change 'right: 50%' to 'right: auto', the >> testcase still passes >> anyway. >> ------ >> A good bunch of Microsoft submitted testcases have unnecessary (or unjustified or unneeded) declarations (eg height: 0; border-collapse: collapse; dir: rtl; position: absolute;) or extraneous div containers. >> It is not a reason to reject them but... it is a reason to believe that such >> testcases are not best and that the test suite could be improved. > Do any of the above comments mean you cannot submit an implementation report? No. None of the above comments mean I cannot submit an implementation report. But those comments justify to not have a complete blind faith in the Implementation Report results, blind (trust) confidence in all of the testcases that are passed. Submitting an implementation report does not improve in any way the quality, the trustworthiness, the reliability of any testcases. You can take a car for an impromptus road test; it does not mean the car mechanic/technician thinks it's a good idea.. he may be pulling his hair with anxiety. At some point, you will have to review the testcases and control their intrinsec quality, robustness, accuracy, reliability, etc. Ideally/preferably, you would want to review all of the testcases before submitting an implementation report.., wouldn't you? >> ------ >> Some are not very relevant testcases: if a testcase is passed when CSS >> support is disabled, then such testcase's relevance is rather limited, >> otherwise questionable. Ideally, you would want all testcases to fail when >> using Lynx 2.8.5 or NS3 or a non-capable CSS browser. >> ------ >> Some sections are under-tested (e.g. several sub-sections of section 10.3) while some others are IMO over-tested. Did you know that there are >> over 600 testcases testing the zero value (+-signed and unsigned; for the 9 >> different units; for many properties). > A lot of times browsers have implemented different rounding algorithms for different properties. Using floor for one and ceiling for another. Regardless, though, those tests are superfast to run. >> --------- >> My conclusion is that automatable testing, while definitely preferable >> over >> manual testing, will not do much if the testcases are not reviewed, checked, >> corrected or adjusted accordingly to begin with. You first want to have >> reliable, trustworthy, accurately designed testcases before creating reftests >> or labelling correspondent screenshots. > The ask from the w3c here is to submit an Implementation Report against > the current test suite. If there are issues with the tests that need to > be submitted, then they should be submitted There are issues with some tests. I have submitted the issues I found. In some cases, twice in the mailing list and before RC1. > and the working group should > evaluate them on their value-add to the suite and then the appropriate action should be taken. I really do not mind or oppose my reports being evaluated, scrutinized as well. I have no problem with such protocols or reciprocity. regards, Gérard -- Contributions to the CSS 2.1 test suite: http://www.gtalbot.org/BrowserBugsSection/css21testsuite/ CSS 2.1 test suite (RC1; September 17th 2010): http://test.csswg.org/suites/css2.1/20100917/html4/toc.html CSS 2.1 test suite contributors: http://test.csswg.org/source/contributors/
Received on Wednesday, 22 September 2010 15:26:02 UTC