Request for Feedback On Test Harness from James Graham on 2010-11-30 (public-html-testsuite@w3.org from November 2010)

From: James Graham <jgraham@opera.com>
Date: Tue, 30 Nov 2010 10:45:00 +0100
To: "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>
Message-ID: <4CF4C79C.3050209@opera.com>
I am looking for some feedback on the test harness script
testharness.js (note that this would better have been called a
"framework", but I will continue to use the term harness
throughout). In particular if there are requirements that people have
for writing tests that have not been considered or there are rough edges 
in the harness that should be fixed, it would be good to
know about them now so that the problems can be addressed.

Primarily I am interested in feedback about the design and API since 
those are harder to fix later. However comments on the implementation 
are also welcome; I know of a few problems already that I intend to 
address.

To frame the discussion in context, I think it will be useful for me
to elaborate on the design goals of the current harness, and provide 
some details of how it tries to meet them.

== One or Multiple Tests per File ==

Although it is often possible to have just a single test per file, in
some cases this is not efficient e.g. if generating many tests from
some relatively small amount of data. Nevertheless it should be
possible to regard the tests as independent from the point of view of
collecting results i.e. it should not be necessary to collapse many
tests down into a single result just to keep the test harness
happy. Obviously people using this ability have to be careful not to
make one test depend on state created by another test in the same file
regardless of what happens in that test.

For this reason the harness separates the concept of "test" from the
concept of "assertion". One may have multiple tests per file and, for
readability (see below) each may have multiple assertions. It also
strengthens the requirement (below) to catch all errors in each test
so they don't affect other tests.

== Suitable for writing both synchronous and asynchronous tests ==

Many DOM APIs are asynchronous and testing these APIs must be well
supported by the test harness. It is also a useful optimization to be
able to write simple tests in a sync. fashion because e.g. checking
that some DOM attribute has a given value for some input markup is a
common sort of problem and tests should be correspondingly easy to
write.

The harness has explicit support for both sync and async tests through
the sync and async methods.

== Minimal Dependence on Correct HTML Implementation ==

If the test harness itself depends on HTML features being correctly
implemented, it is rather hard to use it to test those
features. As far as possible it has been designed to only use
ECMAScript and DOM Core features.

== Robust Against Unexpected Failure ==

Tests may not fail just because of the particular assertions that are
being tested, but because of implementation bugs affecting the test,
or because of some unexpected brokenness caused by the test
environment. In general it is not a safe assumption that the test
author has verified the test only fails in the expected way in all
implementations that may be of interest, or that they have written the
test to be defensive against unexpected errors. As far as possible,
such errors should affect the minimum number of tests i.e. on a page
containing multiple tests a single unexpected failure should not stop
all other tests from executing.

To deal with this problem tests are run in a try / catch block. Any
error, caused by an assertion or caused by an unexpected bug in the
implementation, is caught and causes the test to fail. Other tests on
the same page remain unaffected by the error.

== Consistent, easy to read assertions ==

In order to make it clear what a test is aiming to check, a rich,
descriptive assertion API is helpful. In particular, avoiding a style
where test authors are tempted to do passed = condition1 && condition2
&& condition3; assert(passed) is desirable since this can make tests
complex to follow. Such a rich API also allows common, complex,
operations to be factored out into the harness rather than
reimplemented in different ways by each individual author. A good
example of this is testing that a property is "readonly". This can be
done more or less comprehensively and, depending on WebIDL, may change
its exact meaning (this happened recently for example). By factoring
out a test for readonly into a specific assertion, all tests for
readonly attributes can be made in the same way and get updated
together if necessary. This also helps to make tests written by a
diverse range of authors easier to compare since it follows the
pythonic principle that "there should be one (and preferably only one)
obvious way to do it".

To this end, the harness has a rich set of assertions that can be
invoked using assert_* functions (currently fewer
than I would like, but that is a quality of implementation issue that
can be fixed). Assertions like assert_readonly

== Good Error Reporting ==

As far as possible, the harness should make it clear what failed and
why. In general it is not possible to get the stack out of an
exception in a generic way, but since there are high-level assertion
functions the harness can report exactly what was expected and what
occurred instead. Individual assertions can also be labelled to further
improve error reporting. In the case of unexpected errors, the error
message from the error object is displayed.

== Easy to Write Tests ==

Tests should be as easy as possible to write, so that people mostly
write tests that use the harness well and are easy to follow, and so
that it is not too burdensome to write the tests.

This is aided by the rich assertion API since one does not have to
repeat the code to correctly check for various things again and
again. There is some overhead in the harness due to the need to
structure async tests into steps and the use of callback functions to
wrap individual steps. However given the other requirements it is
difficult to see how to avoid this; a great fraction of the overhead
is purely javascript syntax ("function() {}"), and, I think, the need
to structure tests in a consistent way is a boon to readability.
Received on Tuesday, 30 November 2010 09:45:40 UTC