Re: Request for Feedback On Test Harness from Henri Sivonen on 2010-12-01 (public-html-testsuite@w3.org from December 2010)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 1 Dec 2010 14:24:01 +0200
To: public-html-testsuite@w3.org
Cc: James Graham <jgraham@opera.com>
Message-Id: <2E58676A-44E1-4725-AFAC-D012E906A636@iki.fi>
Thanks for writing this. I'll comment mostly on stuff I don't like, so this will appear as overly negative. Apologies in advance.

I'm also sorry I didn't show up earlier to say that the harness should mostly clone the programming model of Mochitest. (I thought it was obvious that that would happen so I wouldn't need to be around to say it.)

> == One or Multiple Tests per File ==
> 
> Although it is often possible to have just a single test per file, in
> some cases this is not efficient e.g. if generating many tests from
> some relatively small amount of data. Nevertheless it should be
> possible to regard the tests as independent from the point of view of
> collecting results i.e. it should not be necessary to collapse many
> tests down into a single result just to keep the test harness
> happy. Obviously people using this ability have to be careful not to
> make one test depend on state created by another test in the same file
> regardless of what happens in that test.
> 
> For this reason the harness separates the concept of "test" from the
> concept of "assertion". One may have multiple tests per file and, for
> readability (see below) each may have multiple assertions. It also
> strengthens the requirement (below) to catch all errors in each test
> so they don't affect other tests.

I think the separate concept of a test isn't particularly helpful. The html5lib tree builder test suite was integrated into Mozilla's mochitest test suite as one mochitest that runs one assertion per html5lib tree builder test suite test. The only reason why the html5lib tree builder test suite is now split into two "tests" in Mozilla's test suite is to avoid the test taking so long that the harness kills it.

I think people should err on the side of putting more stuff into a single test file, because each file has some loading overhead and the test suite needs to scale up and the test suite should be runnable automatically whenever someone pushes code to a browser's version control system, such as mozilla-central.

When more stuff is tested in one test file, I think each small thing should be an assertion on its own. That is, one should do
ok(cond1, "msg1");
ok(cond2, "msg2");
instead of
ok(cond1 && cond2, "msg");

However, I don't see value in grouping assertions into logical "tests" within a file. Mozilla doesn't have such a grouping mechanism on Mochitest and I've never found the lack of such a mechanism to be a problem. 

I think assertion grouping is in the YAGNI department.

> == Suitable for writing both synchronous and asynchronous tests ==
> 
> Many DOM APIs are asynchronous and testing these APIs must be well
> supported by the test harness. It is also a useful optimization to be
> able to write simple tests in a sync. fashion because e.g. checking
> that some DOM attribute has a given value for some input markup is a
> common sort of problem and tests should be correspondingly easy to
> write.
> 
> The harness has explicit support for both sync and async tests through
> the sync and async methods.

In my experience with Mochitest, most tests I write are asynchronous. However, Mochitest defaults to synchronous tests and implicit finish on the 'load' event. If I had a chance to go back in time and redesign Mochitest, I'd make asynchronous the default mode and required the test writer always to call an explicit finish() method. People wanting finishing on 'load' would simply say <body onload='finish()'>.

I'd like to encourage the HTML WG to adopt a model where the act of evaluating the harness script starts the test. Then assertion methods are called one or more times (zero calls would be considered a failure in order to catch silly mistakes) and then an explicit call to a finish() method would end the test and dump the results.

I think the .step() model is onerous compared to calling an explicit finish() function when all asynchronous tests are done. Occasionally one will write tests where asynchronous things will finish in an unpredictable order. In that case, I think it's OK to require the test writer to keep count of those things and call explicit finish() when the count goes to zero. At least it's better than requiring everyone to use a lot of boilerplate in simpler cases.

> == Minimal Dependence on Correct HTML Implementation ==
> 
> If the test harness itself depends on HTML features being correctly
> implemented, it is rather hard to use it to test those
> features. As far as possible it has been designed to only use
> ECMAScript and DOM Core features.

In practice, the tests will also depend on parser-inserted non-async, non-defer scripts executing in the order the </script> end tags appear in the source. I think it's OK to rely on that.

I think it's also OK for the harness to install a handler for onerror. In fact, I think it would be quite OK to require browsers to implement onerror in order to be able to run the test harness in a useful way. I think it's less onerous to require those browsers that don't have onerror to implement it than to require everyone to use a lot of boilerplate in *every* *single* *test*.

> == Robust Against Unexpected Failure ==
> 
> Tests may not fail just because of the particular assertions that are
> being tested, but because of implementation bugs affecting the test,
> or because of some unexpected brokenness caused by the test
> environment. In general it is not a safe assumption that the test
> author has verified the test only fails in the expected way in all
> implementations that may be of interest, or that they have written the
> test to be defensive against unexpected errors. As far as possible,
> such errors should affect the minimum number of tests i.e. on a page
> containing multiple tests a single unexpected failure should not stop
> all other tests from executing.
> 
> To deal with this problem tests are run in a try / catch block. Any
> error, caused by an assertion or caused by an unexpected bug in the
> implementation, is caught and causes the test to fail. Other tests on
> the same page remain unaffected by the error.

I think this design decision optimizes the wrong thing. It seems to me that this is optimizing the scores browsers get on an "official" test suite for marketing purposes. That is, when a browser doesn't score 100%, it comes as close to 100% as possible.

I think we should optimize for the ease of writing tests and for finding regressions in a given implementation after an implementation has once fully passed a given test file and the test file has been added to the automated test suite of that implementation. If Firefox passes test_foo.js today and test_foo.js is added to mochitest, it's interesting to keep the file at zero failures. If someone checks in something that makes the test fail, it's not really that important to make a limited number of the assertions in the file fail. It's enough that some assertions fail in order to detect that the checkin or the test was bad.

If we want to optimize for supporting the goal of getting correct implementations, we should optimize for ease of test writing and for integrating the tests into the automated tests different implementations run for each code push. If this means that failures show up as failing more in terms of percentages, I think the right response isn't to complicate the test harness. The right response is not to publish percentages like this: http://www.w3.org/QA/2010/09/how_do_we_test_a_web_browser_o.html

Since I want to optimize for easy test writing and for detecting "no assertion failed in test file" vs. "at least one assertion failed in test file", I feel very strongly that the minimal passing test should look like this:

<!DOCTYPE html>
<title>Minimal test</title>
<script src=/harness.js></script>
<script>
ok(true, "Should have passed!");
finish();
</script>

That is, there should be no boilerplate JavaScript around the call to the ok() assertion function. It should be possible to write a test like this:

<!DOCTYPE html>
<title>Test inline and external scripts</title>
<script src=/harness.js></script>
<body onload=finish();>
<script>var externalRan = false;</script>
<script src="data:,">ok(false, "textContent of an external script should not be evaluated.");</script>
<script src="date:text/javascript,externalRan = true;"></script>
<script>ok(externalRan, "External script should have run.");</script>

I was *very* annoyed when I discovered that the above test can't be written as easily using the HTML WG harness.

> == Consistent, easy to read assertions ==
> 
> In order to make it clear what a test is aiming to check, a rich,
> descriptive assertion API is helpful. In particular, avoiding a style
> where test authors are tempted to do passed = condition1 && condition2
> && condition3; assert(passed) is desirable since this can make tests
> complex to follow. Such a rich API also allows common, complex,
> operations to be factored out into the harness rather than
> reimplemented in different ways by each individual author. A good
> example of this is testing that a property is "readonly". This can be
> done more or less comprehensively and, depending on WebIDL, may change
> its exact meaning (this happened recently for example). By factoring
> out a test for readonly into a specific assertion, all tests for
> readonly attributes can be made in the same way and get updated
> together if necessary. This also helps to make tests written by a
> diverse range of authors easier to compare since it follows the
> pythonic principle that "there should be one (and preferably only one)
> obvious way to do it".
> 
> To this end, the harness has a rich set of assertions that can be
> invoked using assert_* functions (currently fewer
> than I would like, but that is a quality of implementation issue that
> can be fixed). Assertions like assert_readonly

I don't have a problem with this approach in general, but I find the assertion function names annoyingly verbose compared to those in Mochitest. I think we should use ok(cond, msg) instead of assert_true(cond, msg) and is(val1, val2, msg) instead of assert_equals(val1, val2, msg).

> == Easy to Write Tests ==
> 
> Tests should be as easy as possible to write, so that people mostly
> write tests that use the harness well and are easy to follow, and so
> that it is not too burdensome to write the tests.
> 
> This is aided by the rich assertion API since one does not have to
> repeat the code to correctly check for various things again and
> again. There is some overhead in the harness due to the need to
> structure async tests into steps and the use of callback functions to
> wrap individual steps. However given the other requirements it is
> difficult to see how to avoid this; a great fraction of the overhead
> is purely javascript syntax ("function() {}"), and, I think, the need
> to structure tests in a consistent way is a boon to readability.

I think the harness is currently failing at ease of test writing pretty badly.

I think the test harness should have at least the level of ease that Mochitest has. I think it follows that there shouldn't be a need to wrap the test assertions into any kind of boilerplate.

That is, 
test(function() { assert_true(true, "Message") }, "Dunno what to write here.");
is a severe ease of test writing failure compared to
ok(true, "Message");

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 1 December 2010 12:24:37 UTC