RE: Conversion of MS CSS 2.1 tests to reftests

Hello John,

> Please note, that I ran the entire suite for the first time last
summer

Last summer? You mean this summer 2010.. or summer 2009? There is a
difference of several thousands of testcases if we're talking of summer
2009 versus summer 2010 here.

> and it took me three days of interrupted time (NOT non-interrupted time).

3 days to run how many testcases? How many seconds (avg) per testcase?

> I just now ran 20 tests from the HTML suite and it took me 24
> seconds.

I have huge difficulties understanding how you can run 20 testcases
manually in 24 seconds.

In order to run, say, 20 testcases, you need to do at minimum 39 (2n -1)
mouseclicks. The testcases are not arranged, not coded with <link
rel="next">. So you have to click the back button to get back to the
list of testcases to click the link of next testcase. And you have to
read the pass/fail conditions too of each testcase.

No testcase is http-prefetchable (coded with <link rel="prefetch">).

There are many testcases which require to compare with a reference test.
So, at least, 2 extra clicks.

> I am not saying this is typical, necessarily, and when you hit a
failure
> it certainly adds time, but I think that looking at an 11 second
average
> seems very high in practice.
> -John


I took the test harness in January 2010 (511 tests) and I mentioned this in
http://lists.w3.org/Archives/Public/public-css-testsuite/2010Jan/0043.html
and my results (I was using Konqueror 4.x) are still available,
accessible, viewable here:
http://www.w3.org/2008/07/test-harness-css/results.php?s=htm4&o=0
and it took me 4 hours to run the 511 tests.


There are other testcases situations which will slow down testers

- a bunch of testcases require to download and install a custom font and
then to uninstall it
- a bunch of testcases require to download and install an user style sheet
- a bunch of testcases require to read more than 1 sentence
- a bunch of testcases have small or very small lines, squares as
pass/fail conditions
- a bunch of testcases have awkward wording of pass/fail conditions or
inappropriate shape description of expected result (causing confusion,
hesitation)
- a good bunch of testcases require to compare width or height of 2
squares. If quality (over speed) of testing is more important, if
testers have more than a "It's good enough" sense of quality/QA policy,
then they may report a few more FAILED tests after stopping+spending a
few more seconds. E.g.
http://test.csswg.org/suites/css2.1/20100917/html4/html-attribute-019.htm
"there is no space between the green and blue boxes" is not the same as
the green square partially *overlapping* the blue square.


==============

"I do not think it is worth it to try rushing to REC
while the test suite is in the state it is in."
Anne van Kesteren

I very much agree with Anne van Kesteren's opinion here.

-------

There are wrong testcases in the test suite; not many... hopefully. E.g:
http://test.csswg.org/suites/css2.1/20100917/html4/position-relative-nested-001.htm


-------

There are false positive testcases:

http://test.csswg.org/suites/css2.1/20100917/html4/padding-right-applies-to-013.htm

is a wrong testcase which all testers (regardless of browser actually
testing) would/will report as a PASSED test.

-------

Some are false negative testcases:

http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-115.htm

http://test.csswg.org/suites/css2.1/20100917/html4/vertical-align-116.htm

-------

Some are inaccurately coded testcases. E.g.
a few (many?) *-applies-to-010 (involving 'display: list-item'). If the
tester does not see a bullet list-marker, then the testcase should be
marked as FAILED. The thing is that there are still testcases which do
not say that a bullet list marker should be visible and are still
inappropriately coded which makes them hidden (outside the viewport).
E.g.

http://test.csswg.org/suites/css2.1/20100917/html4/padding-top-applies-to-010.htm

------

Some testcases are not robust testcases or stringent testcases: eg
http://test.csswg.org/suites/css2.1/20100917/html4/right-offset-percentage-001.htm
If you change (or remove) 'position: absolute' and make it static, the
testcase still passes. If you change 'right: 50%' to 'right: auto', the
testcase still passes anyway.

------

A good bunch of Microsoft submitted testcases have unnecessary (or
unjustified or unneeded) declarations (eg height: 0; border-collapse:
collapse; dir: rtl; position: absolute;) or extraneous div containers.
It is not a reason to reject them but... it is a reason to believe that
such testcases are not best and that the test suite could be improved.

------


Some are not very relevant testcases: if a testcase is passed when CSS
support is disabled, then such testcase's relevance is rather limited,
otherwise questionable. Ideally, you would want all testcases to fail
when using Lynx 2.8.5 or NS3 or a non-capable CSS browser.

------

Some sections are under-tested (e.g. several sub-sections of section
10.3) while some others are IMO over-tested. Did you know that there are
over 600 testcases testing the zero value (+-signed and unsigned; for
the 9 different units; for many properties).

---------


My conclusion is that automatable testing, while definitely preferable
over manual testing, will not do much if the testcases are not reviewed,
checked, corrected or adjusted accordingly to begin with. You first want
to have reliable, trustworthy, accurately designed testcases before
creating reftests or labelling correspondent screenshots.

regards, Gérard
-- 
Contributions to the CSS 2.1 test suite:
http://www.gtalbot.org/BrowserBugsSection/css21testsuite/

CSS 2.1 test suite (RC1; September 17th 2010):
http://test.csswg.org/suites/css2.1/20100917/html4/toc.html

CSS 2.1 test suite contributors:
http://test.csswg.org/source/contributors/

Received on Tuesday, 21 September 2010 20:33:11 UTC