Conformance Test Development FAQ

What is conformance testing, and how does it differ from other kinds of testing?

Conformance testing focuses on testing only what is formally specified, in order to verify whether an implementation conform to its specifications.

Conformance tests do not test implementation-specific details (what is not covered by the spec). They do not focus on performance, usability, the capability of an implementation to stand up under stress, or interoperability, except in so far as these characteristics of an implementation are formally specified.

This document focuses primarily on conformance testing, although some of the recommendations are also applicable to other kinds of testing.

Why is conformance testing important?

It's the key to interoperability. (It's necessary, but not sufficient - implementations that conform perfectly to imperfect specifications may fail to interoperate.) Since WGs are typically required to demonstrate two interoperable implementations of each feature of their specification, conformance tests are obviously important.

This type of testing is perhaps less likely to be performed by implementors (who are more likely to focus on performance, usability, features unique to their implementation, etc.)

It improves the quality of the specifications (particularly if performed while the spec is still under development).

When should test development start?

Planning should start very early (ideally, at the same time as spec development begins). Defining a testing approach (what kinds of tests to develop, and how they should operate), and considering 'testability' can be helpful even in the early stages of spec development.

During the planning phase, define the specifications to be tested. (This may seem obvious, but often specifications make reference to or depend on other specifications. It's important to understand the scope of what is to be tested.)

Unless you explicitly want to use the test development process as a way of exploring issues and problems in the specification, it's best to wait until the spec is reasonably stable before starting test development. Otherwise, lots of tests will have to be rewritten as the spec is modified.

Who will develop the tests?

Typically you will have to beg members of your WG to contribute resources to develop tests. If you're very lucky, you might also be able to persuade other interested parties to contribute

Either way, you will have to solicit and manage contributions from others. This can require a significant amount of organization and effort on your part if you are to get quality tests that cover the full range of the specification. Take the time to create a high-quality and informative 'appeal for contributions'.

Specify the format in which tests should be developed (for example, how they should be invoked and how they should report their results), and any metadata that should be supplied with them (for example a definition of the purpose of the test, a pointer to the portion of the specification that is tested, the expected results, etc.).

Define a process to manage contributions. Review submissions to ensure that they are appropriate and correct. Keep track of who submitted what, and of the 'state' that a particular test is in (submitted, reviewed, accepted, returned for revision, rejected, etc.)

How do we decide what tests to develop?

Try to focus development efforts where they will be most effective and useful:

where there's a greater chance that implementations will be non-conformant (for example, where implementors are more likely to 'cut corners', or where they are least likely to test during the development process)
where the consequences of non-conformance would be greatest (eg, breaking interoperability)

Tell people what you need (what areas of the spec should be covered) - don't just leave it up to them to develop whatever they want.

How many tests are enough?

There's no simple answer to this question; it depends on the goals that you set yourself and on the resources you have available.

What is most important is that you get the 'best' coverage for the resources you are able to apply. Coverage goals and results can be specified in terms of the number of tests that are developed for areas of the specification (logical sections, testable assertions, or even paragraphs or pages). A useful coverage metric is 'assertion-breadth coverage' or simply 'assertion coverage', which is defined as the percentage of testable assertions for which at least one test has been developed.

Note that it may be appropriate to define different coverage goals for different areas of the specification.

Whether or not you define coverage goals in advance, it is always helpful to provide some kind of coverage report with your test suite. This could be as simple as a mapping of tests to areas of the specification, or a more detailed report providing counts and averages of the number of tests associated with different areas. Such reports can help the users of your test suite understand its strengths and weaknesses.

How should tests report their results?

All tests in the test suite should report their results in a consistent manner, making it easy for humans or computer programs to understand and to process them. The following test-result states, defined by EARL (the Evaluation And Report Language), have proved useful:

cannotTell
fail
notApplicable
notTested
pass

Some WGs have defined RDF formats for collecting and processing test results, and there are a number of XSLT style sheets that can be used to format results in an attractive way.

Do I really have to worry about all that legal stuff?

Unfortunately, yes. Copyright, patent, and license issues can upset the best-organized test development efforts. Your test suite will need to be distributed under a W3C-approved license (you will need to decide which), and this means that contributions to the test suite will have to provided under contribution licenses that do not contradict or inhibit the distribution license.

How should I package and publish my tests?

Conformance tests are useful - a conformance test suite is much more useful. What's the difference?

Test runs should be deterministic (that is, for a particular implementation on a particular configuration different testers should obtain the same results). If you simply publish a random collection of tests - for example, a directory containing lots of files - it will be difficult for testers to understand:

what tests apply to their implementation (some may apply only to a particular optional feature)
how the tests should be executed
how to interpret the results (did the test run fail and if so, what is it about the implementation that is incorrect?)
whether or not their test run was successful (whether they can claim conformance)

Package the tests up into a test suite. Provide documentation explaining how to determine which tests to run, how to run the tests, how to interpret the results, and how to make a conformance claim.

If possible, automate the test execution, or provide metadata and documentation sufficient to allow others to do so.

Once I publish my tests, I'm done, right?

Sorry, no. Test suites must evolve over time

to meet the needs of changing specs (revisions)
to improve quality and/or coverage
to fix bugs found during development, testing, or use of the test suite

This implies that you should plan for multiple releases of the test suite. Use version numbers so people know what version they're using. State which version of the specification your test suite addresses.

How should I handle bugs in my test suite?

Firstly, test your test-suite before you publish it. If it's really buggy people won't trust it and won't use it.

No matter how thoroughly you test, bugs will still slip through. Define a process to accept and respond to bug-reports. In response to bugs it might be necessary to:

exclude broken tests from the test suite
create and distribute alternate tests
update the documentation, harness, or framework

(Note: some bug-reports might require modifications to the specification rather than the test suite.)

Unless you want to define a 'patch process' to allow partial updates to the test suite (this is probably more trouble than it's worth), the simplest way to handle bugs might be to publish a list of known issues with workarounds where appropriate, together with a list of tests known to be incorrect (and which therefore need not be run, or whose failure can be ignored). Periodically you should issue revisions of the test suite in which the problems are corrected.

Should test results be published?

While it's not required by W3C processes, providing a means for people to publish their test results can be beneficial. Publicity and competition provide strong incentives for developers to improve their implementations. A simple web-based submission and publication process would be ideal.