Test Development FAQ

Working Draft April 3, 2005

Introduction

This Test FAQ is addressed to anyone who develops tests or who is putting together a testing effort. The information should also be of interest to those developing specifications - since they have an interest in effective test development - and those who may need to run tests.

The document provides introductory information about a variety of testing-related topics: the purpose of testing, how to get started, what is involved, etc. Much of the FAQ documents what is already the 'norm' or considered good practice.

This is a living document. Please provide feedback by emailing the W3C QA Working Group.

1. What kinds of testing is important in the W3C?

The W3C is interested in promoting the development of "interoperable technologies (specifications, guidelines, software, and tools)". Two types of testing are particularly helpful in promoting these goals:

Conformance testing focuses on testing what is formally specified, in order to verify whether an implementation conforms to its specifications. This form of testing does not focus on implementation-specific details (what is not covered by the spec), nor on performance, usability, the capability of an implementation to stand up under stress, or interoperability, except in so far as these characteristics of an implementation are formally required by the specification.

Interoperability testing focuses on finding interoperability issues between implementations of a given specification.

Both forms of testing can help to detect defects (ambiguities, inaccuracies, etc.) in specifications, and are therefore particularly useful when conducted in parallel with specification development.

Since the W3C's Proposed Recommendation entrance criteria include the requirement to demonstrate two interoperable implementations, WGs are increasingly interested in interoperability testing and in conformance test development, which is the key to interoperability. (It's a necessary, but not sufficient condition; implementations that conform perfectly to imperfect specifications may still fail to interoperate.)

This document focuses primarily on conformance and interoperability testing, although some of the recommendations are also applicable to other kinds of testing. (For a great deal of useful information about testing, including a comprehensive classification of different types of testing, see here.)

2. When should test development start?

Planning should start very early (ideally, at the same time as spec development begins). Defining a testing approach (what kinds of tests to develop, and how they should operate), and thinking about 'testability' can be helpful even in the early stages of spec development.

During the planning phase, identify the specifications to be tested. This may seem obvious, but often specifications make reference to or depend on other specifications. It's important to understand and to limit the scope of what is to be tested; focus on what you really need to test rather than on ancillary technologies that may be utilized indirectly by implementations.

Typically, Working Groups develop their test suites when the specifications have reached a reasonable level of stability. However, it is important to start the test development process before the specification is 'frozen' since this helps to identify problems (ambiguities, lack of clarity, contradictions) while there is still an opportunity to correct them.

Another interesting approach - often referred as Test Driven Development - is to develop tests specifically to explore issues and problems in the specification. (The OWL Working Group found this approach helpful.) Note that this implies significantly more work to keep the specification and the tests synchronized.

3. Who will develop the tests?

Typically you will need to persuade members of your WG to contribute resources to develop tests. It may also be worthwhile to approach third parties - for example, organizations that have an interest in the effective deployment of the technology - that might be interested in contributing to this work without having the resources or competences to participate directly in the group.

Either way, you will have to solicit and manage contributions from others. This can require a significant amount of organization and effort on your part if you are to get quality tests that cover the full range of the specification. Take the time to create a high-quality and informative 'appeal for contributions'.

Specify the format in which tests should be developed (for example, how they should be invoked and how they should report their results), and any metadata that should be supplied with them (for example a definition of the purpose of the test, a pointer to the portion of the specification that is tested, the expected results, etc.). For examples of such guidelines, see

Define a process to manage contributions. Review submissions to ensure that they are appropriate and correct. Keep track of who submitted what, and of the 'state' that a particular test is in (submitted, reviewed, accepted, returned for revision, rejected, etc.) A test-case management system can help with this task

For examples of test review guidelines see:

See als the Web Content Accessibility Guidelines WG's list of test review statuses.

Examples of test-case management systems? (XForms group supposedly has a test-case management system, but it's not clear from this link how it works. The Voice Broswer WG also has a system - need reference.)

4. How do we decide what tests to develop?

Try to focus development efforts where they will be most effective and useful:

Ask test developers to give priority to the areas of the spec where coverage is most needed - don't just leave it up to them to develop whatever they want. (This can also help to avoid duplication of effort.) Note that this implies the creation and maintenance of some kind of 'coverage map' (see the next question for more on this).

Example of a WG that guides test contributors, telling them where tests are most needed?

5. How many tests are enough?

There's no simple answer to this question; it depends on the goals that you set yourself and on the resources you have available.

What is most important is that you get the 'best' coverage for the resources you are able to apply. Coverage goals and results can be specified in terms of the number of tests that are developed for areas of the specification (features, logical sections, testable assertions, or even paragraphs or pages). A particularly useful coverage metric is 'assertion-breadth coverage' or simply 'assertion coverage', which is defined as the percentage of testable assertions for which at least one test has been developed.

Note that it may be appropriate to define different coverage goals for different areas of the specification.

Whether or not you define coverage goals in advance, it is always helpful to provide some kind of coverage report with your test suite. This could be as simple as a mapping of tests to areas of the specification, or a more detailed report providing counts and averages of the number of tests associated with different areas. Such reports can help the users of your test suite understand its strengths and weaknesses

The XForms Test Suite doesn't directly publish coverage number but does assert that it has covered all the test assertions defined in the Specs:

The HTML 4.01 Test Suite shows which assertions have a matching test case.

More examples of WGs publishing coverage numbers?

6. How should tests report their results?

All tests in the test suite should report their results in a consistent manner, making it easy for humans or computer programs to understand and to process them. The following test-result states, defined by EARL (the Evaluation And Report Language), have proved useful:

Some WGs have defined RDF formats for collecting and processing test results, and there are a number of XSLT style sheets that can be used to format results in an attractive way. For example:

See also the QA Working Group's Matrix of W3C Specifications (links to implementation reports can be found in the last column).

The more information that failing tests report (within reason), the more useful they are. If your users know that one test out of one thousand fails, but they don't know what it was testing or why it failed, that isn't very helpful. If they know what the test was testing, what behaviour it was expecting from the implementation under test, and how the implementation failed to conform to these expectations, this will make it much easier for them to find and fix the problem. The more useful your test suite is, the more it will be used.

7. Do I really have to worry about all that legal stuff?

Unfortunately, yes. Copyright, patent, and license issues can upset the best-organized test development efforts. Your test suite will need to be distributed under a W3C-approved license (the two licenses most often used for the distribution of W3C materials are the Document License and the Software License; you will need to decide which is appropriate). This means that contributions to the test suite will have to provided under contribution licenses that do not contradict or inhibit the distribution license. See these Policies for Contribution of Test Cases to W3C, and note the importance of the W3C's Patent Policy. The QA Handbook contains a brief discussion of the issues involved here.

It is advisable to specify in your submission guidelines the licensing terms under which contributions will be distributed (see the DOM Conformance Test Suites Process Document for an example of how to do this).

8. How should I package and publish my tests?

Conformance tests are useful - a conformance test suite is much more useful. What's the difference?

Test runs should be deterministic (that is, for a particular implementation on a particular configuration different testers should obtain the same results). If you simply publish a random collection of tests - for example, a directory containing lots of files - it will be difficult for testers to understand:

Package the tests up into a test suite. Explain how to determine which tests to run, how to run the tests, how to interpret the results, and how to make a conformance claim. A complete test suite will contain, in addition to tests, some or all of the following:

Examples of real test suites (containing docs, harness, etc.)? (Still TBD, though Dom nominated the SVG test suite on the grounds that it provides a harness and documentation, and is packaged in a zip file.)

As discussed above, a high-quality test suite will contain documentation that explains to users how to execute the tests and how to interpret the results. More specifically, the documentation should explain:

Example of good test suite documentation?

NOTE: some WGs conflate test development guidelines with user-level test suite documentation (for example, the CSS Test Suite Documentation and the HTML4 test suite documentation. These really should be separate, since they address two completely different audiences. How to make this point diplomatically?

10. Should I automate test execution?

If at all possible, yes. Automated test runs are less prone to 'operator error' and more likely to be 'deterministic' (to report the same results when run on similar configurations at different times). If automation is impractical because it would require the construction a test harness and/or framework code that runs on a variety of different platforms, provide metadata and documentation sufficient to enable others to do so. See here for a discussion of test-case metadata.

Some types of tests are inherently difficult or impossible to automate (for example, tests that require human visual confirmation). In these circumstances the process of running the tests should still be 'routinized' as much as possible. (Provide a standard set of prompts for the tester to respond to, together with clear descriptions of what to expect and how to judge whether the implementation is correct.) See the MUTAT tool for one approach to this issue, and the Web Content Accessibility Guidelines 2.0 test suite for a practical example of such a test suite.

The easier it is to run your tests, the more widely they will be used.

More examples of automated test suites, and/or of tests published with metadata allowing others to automate?

11. Once I publish my tests, I'm done, right?

Sorry, no. Test suites must evolve over time

This implies that you should plan for multiple releases of the test suite. Use version numbers so people know what version they're using. State which version or versions of the specification that your test suite addresses.

For example, the SVG WG has published three versions of its test suite, while the CSS WG maintains a complete list of its test suites. Another approach (used by the OWL, RDF, and SOAP WGs) is to publish test suites as Technical Reports, so they are 'naturally' versioned (using the previous/this/latest versions links in each Technical Rreport).

More examples of WGs that have released multiple versions of their test suite?

12. How should I handle bugs in my test suite?

Firstly, test your test-suite before you publish it. If it's really buggy people won't trust it and won't use it.

No matter how thoroughly you test, bugs will still slip through. Define a process to accept and respond to bug-reports. In response to bugs it might be necessary to:

An issue-management system such as Bugzilla can help with this task. See this member-only link to the XML Query WG's use of Bugzilla, for example.

Unless you want to define a 'patch process' to allow partial updates to the test suite (this is probably more trouble than it's worth), the simplest way to handle bugs might be to publish a list of known issues with workarounds where appropriate, together with a list of tests known to be incorrect (and which therefore need not be run, or whose failure can be ignored). Periodically you should issue revisions of the test suite in which the problems are corrected.

13. Should test results be published?

While it's not required by W3C processes, providing a means for people to publish their test results can be beneficial. Publicity and competition provide strong incentives for developers to improve their implementations. A simple web-based submission and publication process would be ideal.

14. Should we implement a branding or certification program?

While you may not want to define and implement a fully-fledged program with all of the legal and administrative overhead that this implies, a simple logo or icon that can be displayed on a web-page ("compatible with xxx") may be useful. Note that whatever program you implement should probably involve self certification (you do not want to be in the business of certifying implementations as conformant, since this is legally risky).

For a discussion of the issues involved in certification programs see here. See the W3C XHTML validator program and the Web Content Accessibility Group's logo program for examples of successful logo programs.