Conformance Test Development FAQ

Test Development FAQ

What kinds of testing are important in the W3C?

The W3C is interested in promoting the development of "interoperable technologies (specifications, guidelines, software, and tools)". Two types of testing are particularly helpful in promoting these goals:

Conformance testing focuses on testing what is formally specified, in order to verify whether an implementation conforms to its specifications. This form of testing does not focus on implementation-specific details (what is not covered by the spec), nor on performance, usability, the capability of an implementation to stand up under stress, or interoperability, except in so far as these characteristics of an implementation are formally required by the specification.

Interoperability testing focuses on finding interoperability issues between implementations of a given specification.

Both forms of testing can help to detect defects (ambiguities, inaccuracies, etc.) in specifications, and are therefore particularly useful when conducted in parallel with specification development.

Since the W3C's Proposed Recommendation entrance criteria include the requirement to demonstrate two interoperable implementations, WGs are increasingly interested in interoperability testing and in conformance test development, which is the key to interoperability. (It's a necessary, but not sufficient condition; implementations that conform perfectly to imperfect specifications may still fail to interoperate.)

This document focuses primarily on these kinds of testing, although some of the recommendations are also applicable to other kinds of testing.

When should test development start?

Planning should start very early (ideally, at the same time as spec development begins). Defining a testing approach (what kinds of tests to develop, and how they should operate), and thinking about 'testability' can be helpful even in the early stages of spec development.

During the planning phase, define the specifications to be tested. This may seem obvious, but often specifications make reference to or depend on other specifications. It's important to understand and to limit the scope of what is to be tested; focus on what you really need to test rather than on ancillary technologies that may be utilized indirectly by implementations.

Unless you explicitly want to use the test development process as a way of exploring issues and problems in the specification - a valid and interesting approach that has been adopted by a number of Working Groups [@@ example @@]- it's best to wait until the spec is reasonably stable before starting test development. Otherwise, lots of tests will have to be rewritten as the spec is modified.

Who will develop the tests?

Typically you will have to beg members of your WG to contribute resources to develop tests. If you're very lucky, you might also be able to persuade other interested parties to contribute

Either way, you will have to solicit and manage contributions from others. This can require a significant amount of organization and effort on your part if you are to get quality tests that cover the full range of the specification. Take the time to create a high-quality and informative 'appeal for contributions'.

Specify the format in which tests should be developed (for example, how they should be invoked and how they should report their results), and any metadata that should be supplied with them (for example a definition of the purpose of the test, a pointer to the portion of the specification that is tested, the expected results, etc.). For an example of such guidelines, see the CSS Test Authoring Guidelines.

Define a process to manage contributions. Review submissions to ensure that they are appropriate and correct. Keep track of who submitted what, and of the 'state' that a particular test is in (submitted, reviewed, accepted, returned for revision, rejected, etc.) A test-case management system [@@ example @@] can help with this task.

How do we decide what tests to develop?

Try to focus development efforts where they will be most effective and useful:

where there's a greater chance that implementations will be non-conformant (for example, where implementors are more likely to 'cut corners', or where they are least likely to test during the development process)
where the consequences of non-conformance would be greatest (eg, breaking interoperability or jeopardizing security).

Tell people what you need (what areas of the spec should be covered) - don't just leave it up to them to develop whatever they want. Note that this implies the creation and maintenance of some kind of 'coverage map' (see the next question for more on this).

How many tests are enough?

There's no simple answer to this question; it depends on the goals that you set yourself and on the resources you have available.

What is most important is that you get the 'best' coverage for the resources you are able to apply. Coverage goals and results can be specified in terms of the number of tests that are developed for areas of the specification (features, logical sections, testable assertions, or even paragraphs or pages). A particularly useful coverage metric is 'assertion-breadth coverage' or simply 'assertion coverage', which is defined as the percentage of testable assertions for which at least one test has been developed.

Note that it may be appropriate to define different coverage goals for different areas of the specification.

Whether or not you define coverage goals in advance, it is always helpful to provide some kind of coverage report with your test suite. This could be as simple as a mapping of tests to areas of the specification, or a more detailed report providing counts and averages of the number of tests associated with different areas. Such reports can help the users of your test suite understand its strengths and weaknesses.

How should tests report their results?

All tests in the test suite should report their results in a consistent manner, making it easy for humans or computer programs to understand and to process them. The following test-result states, defined by EARL (the Evaluation And Report Language), have proved useful:

cannotTell
fail
notApplicable
notTested
pass

Some WGs have defined RDF formats for collecting and processing test results, and there are a number of XSLT style sheets that can be used to format results in an attractive way [@@ provide links to examples @@].

The more information that failing tests report (within reason), the more useful they are. If your users know that one test out of one thousand fails, but they don't know what it was testing or why it failed, that isn't very helpful. If they know what the test was testing, what behaviour it was expecting from the implementation under test, and how the implementation failed to conform to these expectations, this will make it much easier for them to find and fix the problem. The more useful your test suite is, the more it will be used.

Do I really have to worry about all that legal stuff?

Unfortunately, yes. Copyright, patent, and license issues can upset the best-organized test development efforts. Your test suite will need to be distributed under a W3C-approved license (you will need to decide which), and this means that contributions to the test suite will have to provided under contribution licenses that do not contradict or inhibit the distribution license.

How should I package and publish my tests?

Conformance tests are useful - a conformance test suite is much more useful. What's the difference?

Test runs should be deterministic (that is, for a particular implementation on a particular configuration different testers should obtain the same results). If you simply publish a random collection of tests - for example, a directory containing lots of files - it will be difficult for testers to understand:

what tests apply to their implementation (some may apply only to a particular optional feature)
how the tests should be executed
how to interpret the results (did the test run fail and if so, what is it about the implementation that is incorrect?)
whether or not their test run was successful (whether they can claim conformance)

Package the tests up into a test suite. Explain how to determine which tests to run, how to run the tests, how to interpret the results, and how to make a conformance claim. A complete test suite will contain some or all of the following:

test harness
tests
documentation
licensing and copyright information

What should the test documentation cover?

As discussed above, a high-qualitytest suite will contain documentation that explains to users how to execute the tests and how to interpret the results. More specifically, the documentation should explain:

the specification(s) covered by the tests
the objectives and scope of the test suite
what areas of the specification(s) are covered, and how thoroughly
how to determine what tests to run
how to execute tests (explaining the use of the test harness, if supplied)
how to interpret test results
how to publish test results
how to make a conformance claim
how to challenge the validity of a test or submit a bug report

Should I automate test execution?

If at all possible, yes. Automated test runs are less prone to 'operator error' and more likely to be 'deterministic' (to report the same results when run on similar configurations at different times). If automation is impractical because it would require the construction a test harness and/or framework code on a variety of different platforms, provide metadata and documentation sufficient to enable others to do so.

Some types of tests are inherently difficult or impossible to automate (for example, tests that require human visual confirmation). In these circumstances the process of running the tests should still be 'routinized' as much as possible. (Provide a standard set of prompts for the tester to respond to, together with clear descriptions of what to expect and how to judge whether the implementation is correct.)

Once I publish my tests, I'm done, right?

Sorry, no. Test suites must evolve over time

to meet the needs of changing specs (revisions)
to improve quality and/or coverage
to fix bugs found during development, testing, or use of the test suite

This implies that you should plan for multiple releases of the test suite. Use version numbers so people know what version they're using. State which version or versions of the specification that your test suite addresses.

How should I handle bugs in my test suite?

Firstly, test your test-suite before you publish it. If it's really buggy people won't trust it and won't use it.

No matter how thoroughly you test, bugs will still slip through. Define a process to accept and respond to bug-reports. In response to bugs it might be necessary to:

exclude broken tests from the test suite
create and distribute alternate tests
update the documentation, harness, or framework
modify the specification to correct an ambiguity or contradiction

An issue-management system such as Bugzilla [@@ example @@] can help with this task.

Unless you want to define a 'patch process' to allow partial updates to the test suite (this is probably more trouble than it's worth), the simplest way to handle bugs might be to publish a list of known issues with workarounds where appropriate, together with a list of tests known to be incorrect (and which therefore need not be run, or whose failure can be ignored). Periodically you should issue revisions of the test suite in which the problems are corrected.

Should test results be published?

While it's not required by W3C processes, providing a means for people to publish their test results can be beneficial. Publicity and competition provide strong incentives for developers to improve their implementations. A simple web-based submission and publication process would be ideal.

Should we implement a branding or certification program?

While you may not want to define and implement a fully-fledged program with all of the legal and administrative overhead that this implies, a simple logo/icon that can be displayed on a web-page ("compatible with xxx") may be useful. Note that whatever program you implement should probably involve "self certification" (you do not want to be in the business of "certifying" implementations as conformant since this is legally risky.