Test Development FAQ

Working Draft March 6, 2005

Introduction

TBD...

1. What kinds of testing are important in the W3C?

The W3C is interested in promoting the development of "interoperable technologies (specifications, guidelines, software, and tools)". Two types of testing are particularly helpful in promoting these goals:

Conformance testing focuses on testing what is formally specified, in order to verify whether an implementation conforms to its specifications. This form of testing does not focus on implementation-specific details (what is not covered by the spec), nor on performance, usability, the capability of an implementation to stand up under stress, or interoperability, except in so far as these characteristics of an implementation are formally required by the specification.

Interoperability testing focuses on finding interoperability issues between implementations of a given specification.

Both forms of testing can help to detect defects (ambiguities, inaccuracies, etc.) in specifications, and are therefore particularly useful when conducted in parallel with specification development.

Since the W3C's Proposed Recommendation entrance criteria include the requirement to demonstrate two interoperable implementations, WGs are increasingly interested in interoperability testing and in conformance test development, which is the key to interoperability. (It's a necessary, but not sufficient condition; implementations that conform perfectly to imperfect specifications may still fail to interoperate.)

This document focuses primarily on conformance and interoperability testing, although some of the recommendations are also applicable to other kinds of testing.

2. When should test development start?

Planning should start very early (ideally, at the same time as spec development begins). Defining a testing approach (what kinds of tests to develop, and how they should operate), and thinking about 'testability' can be helpful even in the early stages of spec development.

During the planning phase, identify the specifications to be tested. This may seem obvious, but often specifications make reference to or depend on other specifications. It's important to understand and to limit the scope of what is to be tested; focus on what you really need to test rather than on ancillary technologies that may be utilized indirectly by implementations.

Typically, Working Groups develop their test suites when the specifications they're developing have reached a certain level of stability. Another interesting approach - often referred as Test Driven Development - is to develop tests specifically to explore issues and problems in the specification. (The OWL Working Group found this approach helpful.) Note that this implies significantly more work to keep the specification and the tests synchronized.

3. Who will develop the tests?

Typically you will need to persuade members of your WG to contribute resources to develop tests. It may also be worthwhile to approach third parties - for example, organizations that have an interest in the effective deployment of the technology - that might be interested in contributing to this work without having the resources or competences to participate directly in the group.

Either way, you will have to solicit and manage contributions from others. This can require a significant amount of organization and effort on your part if you are to get quality tests that cover the full range of the specification. Take the time to create a high-quality and informative 'appeal for contributions'.

Specify the format in which tests should be developed (for example, how they should be invoked and how they should report their results), and any metadata that should be supplied with them (for example a definition of the purpose of the test, a pointer to the portion of the specification that is tested, the expected results, etc.). For examples of such guidelines, see the CSS Test Authoring Guidelines and the Submission Procedure for XSLT/XPath Test Suites.

Other examples of test-authoring guidelines or submission procedures?

Define a process to manage contributions. Review submissions to ensure that they are appropriate and correct. Keep track of who submitted what, and of the 'state' that a particular test is in (submitted, reviewed, accepted, returned for revision, rejected, etc.) A test-case management system [@@ example @@] can help with this task.

Examples of test review processes or test-case management systems?

4. How do we decide what tests to develop?

Try to focus development efforts where they will be most effective and useful:

Ask test developers to give priority to the areas of the spec where coverage is most needed - don't just leave it up to them to develop whatever they want. (This can also help to avoid duplication of effort.) Note that this implies the creation and maintenance of some kind of 'coverage map' (see the next question for more on this).

Example of a WG that guides test contributors, telling them where tests are most needed?

5. How many tests are enough?

There's no simple answer to this question; it depends on the goals that you set yourself and on the resources you have available.

What is most important is that you get the 'best' coverage for the resources you are able to apply. Coverage goals and results can be specified in terms of the number of tests that are developed for areas of the specification (features, logical sections, testable assertions, or even paragraphs or pages). A particularly useful coverage metric is 'assertion-breadth coverage' or simply 'assertion coverage', which is defined as the percentage of testable assertions for which at least one test has been developed.

Note that it may be appropriate to define different coverage goals for different areas of the specification.

Whether or not you define coverage goals in advance, it is always helpful to provide some kind of coverage report with your test suite. This could be as simple as a mapping of tests to areas of the specification, or a more detailed report providing counts and averages of the number of tests associated with different areas. Such reports can help the users of your test suite understand its strengths and weaknesses

Examples of WGs publishing coverage numbers?

6. How should tests report their results?

All tests in the test suite should report their results in a consistent manner, making it easy for humans or computer programs to understand and to process them. The following test-result states, defined by EARL (the Evaluation And Report Language), have proved useful:

Some WGs have defined RDF formats for collecting and processing test results, and there are a number of XSLT style sheets that can be used to format results in an attractive way [@@ provide links to examples @@].

Examples of style-sheets and test results publication (implementation reports)?

The more information that failing tests report (within reason), the more useful they are. If your users know that one test out of one thousand fails, but they don't know what it was testing or why it failed, that isn't very helpful. If they know what the test was testing, what behaviour it was expecting from the implementation under test, and how the implementation failed to conform to these expectations, this will make it much easier for them to find and fix the problem. The more useful your test suite is, the more it will be used.

7. Do I really have to worry about all that legal stuff?

Unfortunately, yes. Copyright, patent, and license issues can upset the best-organized test development efforts. Your test suite will need to be distributed under a W3C-approved license (you will need to decide which), and this means that contributions to the test suite will have to provided under contribution licenses that do not contradict or inhibit the distribution license.

Need links to W3C licenses..

8. How should I package and publish my tests?

Conformance tests are useful - a conformance test suite is much more useful. What's the difference?

Test runs should be deterministic (that is, for a particular implementation on a particular configuration different testers should obtain the same results). If you simply publish a random collection of tests - for example, a directory containing lots of files - it will be difficult for testers to understand:

Package the tests up into a test suite. Explain how to determine which tests to run, how to run the tests, how to interpret the results, and how to make a conformance claim. A complete test suite will contain, in addition to tests, some or all of the following:

Examples of real test suites (containing docs, harness, etc.)?

9. What should the test documentation cover?

As discussed above, a high-qualitytest suite will contain documentation that explains to users how to execute the tests and how to interpret the results. More specifically, the documentation should explain:

Example of good test suite documentation?

10. Should I automate test execution?

If at all possible, yes. Automated test runs are less prone to 'operator error' and more likely to be 'deterministic' (to report the same results when run on similar configurations at different times). If automation is impractical because it would require the construction a test harness and/or framework code that runs on a variety of different platforms, provide metadata and documentation sufficient to enable others to do so.

Some types of tests are inherently difficult or impossible to automate (for example, tests that require human visual confirmation). In these circumstances the process of running the tests should still be 'routinized' as much as possible. (Provide a standard set of prompts for the tester to respond to, together with clear descriptions of what to expect and how to judge whether the implementation is correct. See the MUTAT tool for one approach to this issue.)

The easier it is to run your tests, the more widely they will be used.

Examples of automated test suites, and/or of tests published with metadata allowing others to automate?

11. Once I publish my tests, I'm done, right?

Sorry, no. Test suites must evolve over time

This implies that you should plan for multiple releases of the test suite. Use version numbers so people know what version they're using. State which version or versions of the specification that your test suite addresses.

Examples of WGs that have released multiple versions of their test suite?

12. How should I handle bugs in my test suite?

Firstly, test your test-suite before you publish it. If it's really buggy people won't trust it and won't use it.

No matter how thoroughly you test, bugs will still slip through. Define a process to accept and respond to bug-reports. In response to bugs it might be necessary to:

An issue-management system such as Bugzilla [@@ example @@] can help with this task.

Pointer to Bugzilla - example of a WG using it (us?)...

Unless you want to define a 'patch process' to allow partial updates to the test suite (this is probably more trouble than it's worth), the simplest way to handle bugs might be to publish a list of known issues with workarounds where appropriate, together with a list of tests known to be incorrect (and which therefore need not be run, or whose failure can be ignored). Periodically you should issue revisions of the test suite in which the problems are corrected.

13. Should test results be published?

While it's not required by W3C processes, providing a means for people to publish their test results can be beneficial. Publicity and competition provide strong incentives for developers to improve their implementations. A simple web-based submission and publication process would be ideal.

Example of a WG encouraging/supporting publication of test results?

14. Should we implement a branding or certification program?

While you may not want to define and implement a fully-fledged program with all of the legal and administrative overhead that this implies, a simple logo/icon that can be displayed on a web-page ("compatible with xxx") may be useful. Note that whatever program you implement should probably involve "self certification" (you do not want to be in the business of "certifying" implementations as conformant since this is legally risky).

Example of a WG encouraging/supporting a certification or logo program ("this page validates....")?