Re: Request for Feedback On Test Harness from Maciej Stachowiak on 2010-11-30 (public-html-testsuite@w3.org from November 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 30 Nov 2010 10:57:26 -0800
To: James Graham <jgraham@opera.com>
Cc: "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>
Message-id: <0821B1B4-8683-480B-8939-E816BF34A68B@apple.com>
On Nov 30, 2010, at 1:45 AM, James Graham wrote:

> I am looking for some feedback on the test harness script
> testharness.js (note that this would better have been called a
> "framework", but I will continue to use the term harness
> throughout). In particular if there are requirements that people have
> for writing tests that have not been considered or there are rough edges in the harness that should be fixed, it would be good to
> know about them now so that the problems can be addressed.
> 
> Primarily I am interested in feedback about the design and API since those are harder to fix later. However comments on the implementation are also welcome; I know of a few problems already that I intend to address.
> 
> To frame the discussion in context, I think it will be useful for me
> to elaborate on the design goals of the current harness, and provide some details of how it tries to meet them.
> 
> == One or Multiple Tests per File ==
> 
> Although it is often possible to have just a single test per file, in
> some cases this is not efficient e.g. if generating many tests from
> some relatively small amount of data. Nevertheless it should be
> possible to regard the tests as independent from the point of view of
> collecting results i.e. it should not be necessary to collapse many
> tests down into a single result just to keep the test harness
> happy. Obviously people using this ability have to be careful not to
> make one test depend on state created by another test in the same file
> regardless of what happens in that test.
> 
> For this reason the harness separates the concept of "test" from the
> concept of "assertion". One may have multiple tests per file and, for
> readability (see below) each may have multiple assertions. It also
> strengthens the requirement (below) to catch all errors in each test
> so they don't affect other tests.

I wrote a script test and found myself stymied by this design choice. It was a script test with a couple dozen test assertions, you can find it in hg at html/tests/submission/Apple/global-attributes/id-attribute.html. I wanted my test output to report each assertion that passed or failed individually. It seems like the only way to do this is to use a separate test() and assert_equals() for each assertion, for example:

    test(function() {
        assert_equals(document.getElementById("abcd"), document.getElementsByTagName("i")[0]);
    }, "User agents must associate the element with an id value for purposes of getElementById.");

    test(function() {
        assert_equals(document.getElementById("ABCD"), document.getElementsByTagName("i")[1]);
    }, "Association is exact and therefore case-sensitive for getElementById.");

In some cases, I had test assertions that depended on DOM changes made in previous tests.

It seems like the intended way to use your framework would be to wrap this all in a single test(), which would result in only a single line output table. It seems to me this would result in a much less useful test. The current test gives a lot of detail about what it tested in the output when it passes, and reports exactly what went wrong when it fails. 

I made a monolithic version as an experiment, where everything is wrapped in one test(), and this has three unfortunate side effects:

1) When the test passes, you only get a single vague message that "The id attribute" is ok, rather than my detailed descriptions of everything tested.
2) When an assertion fails, only the first failing assertion is reported; it seems the others are not run at all. So until you pass 100%, you can't actually tell how well you are doing.
3) When an assertion fails, you don't get a plain English description of the condition that failed.

Relatedly, whether or not you use a test() per assert_equals(), assert_equals() gives a lame error message:

assert_equals:
expected
4
got
1

This doesn't include the expression that gave the wrong result. This is unfortunate when using test-per-assert but damn near useless if you combine multiple assert_equals in one test.


Therefore, I would really appreciate one of the following changes:

A) Add a test_assert_equals() combo call (perhaps the shorthand should even take an expression to eval rather than a function, but it doesn't really matter). That way, you can easily have a list of assertions each of which is reported independently, with a useful error message that reports the bad value.

B) Report each assertion, not just each test, independently in the output, and by default do not stop at a failed assertion. (If needed, we could have variant functions that automatically stop at a bad result, if it makes no sense to continue; and the test should report when not all assertions were run).

I think I would actually prefer (A); even though it is less syntactically convenient, it can continue running further test assertions even after exceptions, so you can always see a list of all assertion passes and failures in the output, even if an earlier assertion failed catastrophically.


Regards,
Maciej
Received on Tuesday, 30 November 2010 18:58:05 UTC