Review of Test Guidelines from Jeremy Carroll on 2003-07-01 (www-qa@w3.org from July 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 01 Jul 2003 17:13:24 +0100
To: www-qa@w3.org
Message-ID: <3F01B324.2080302@hplb.hpl.hp.com>
This is a review of the Test Guidelines WD dated 16 May 2003

I did not find this document helpful.
I do not much believe this review will be helpful either.


I also felt that your document missed insight into how quality can be 
assured, and how the quality of the Test work of a WG can be enhanced.

Specific issues which were not addressed were:


a) how to effectively integrate the test work with the rest of the work of 
the WG. In particular your documents appear to follow a waterfall model of 
specification development:
- a specification is written
- testable assertions are added to it
- metadata is added to those testable assertions
- tests are developed for those testable assertions
etc. etc.

Each step follows the previous step.

A more exciting model of test is one which follows the extreme programming 
model (XP).

In particular, a WG usually has some sort of document as input(e.g. a 
member submission or a previous version of the spec). A test driven spec 
development would use tests that ask questions that the input document does 
not answer unambiguously. For example, much of the interest in the WebOnt 
work arose from the Patel-Schneider paradox [1], which is essentially a 
test case on the DAML+OIL member submission [2] that was the key input to 
webont.

WGs that are not following a waterfall model will not be able to 
participate in your guidelines..

b) Ensuring that quality people develop the tests

ISO 9000 style quality is an attempt to get mediocre, unmotivated people to 
do OK work of a predictable (but low) quality because some power structure 
(typically connected with money and wage-slavery) sits above them, and 
forces them to jump through some hoops. As far as I can tell, the test 
guidelines follow this paradigm. Here is a checklist, and even a not 
particularly competent WG might get adequate results by following these 
guidelines.

W3C WGs depend on volunteers over whom the W3C has little actual power.
Hence using a methodology that depends on the ability to punish and reward 
is futile.

Moreover, web standards have depended in part on getting brilliant people 
to participate. It is an elitist exercise.
This explains for example, the director's dictatorial powers.

Test work is often boring, particularly when placed in a waterfall style 
model, and hence does not engage the minds of brilliant people. 
Strengthening the process for test development is counterproductive because 
it will make test more boring and reduce the attractiveness to brilliant 
people.

It is important to identify what motivates the best people to participate 
in the W3C and what demotivates them. It is at least plausible that peer 
group acclaim is important, and hence it is important that WG members who 
contribute to the test work are adequately acknowledged.

This suggests a priority 1 guideline that there should be a recommendation 
track document for the test work in a WG. A good choice of editor who takes 
pride in their work, will be one of the best ways to ensure a deep quality 
to the test work.

It may be appropriate that each test is individually attributed.

This relates to the previous point in that if
we see tests as driving the development process rather than following, it 
is more likely that good people in the WG will engage in that process.

As it is, the waterfall model you suggest looks like makework for mediocre 
members of the WG

Of course, process guidelines are useful for brilliant people, but they 
need to have a light touch, and be life enhancing rather than deadening.


c) timeliness of tests

While in an ideal world recs would wait for a test suite to be developed, 
and there are adequate volunteers from within the WG to develop the tests, 
this may well not be the case.

It is important to prioritise which tests are the most useful, and which 
are less useful. The issue driven test process, in which test cases often 
form part of an issue resolution, used in particular by RDF Core WG; is an 
excellent way of doing this.

d) cost effectiveness of tests

A W3C WG has limited resources. Developing conformance test suites may be a 
waste of them. It is necessary for a WG to develop a clear specification, 
many of the tests necessary for conformance testing are obvious. The tests 
that are less obvious will be those that come up in an issue driven process 
of specification development, and these should command the majority of WG 
resource.

[1]
http://lists.w3.org/Archives/Public/www-webont-wg/2002Jan/0099

[2]
http://www.daml.org/2001/03/daml+oil-index.html


Detailed review comments:

1.2

You scope the guidelines to conformance tests - however there are other 
reasons why WG may wish to develop tests.

In particular both RDF Core and WebOnt WGs have had issue driven test 
processes, where the proposed tests help clarify different positions 
between members of the WG, and the approved tests clarify the issue 
resolution. Parts of the specs that had no related issues are typically 
unproblematic, and building tests for those parts is less critical, and 
less cost effective.

Hence the phrase:
"However they are applicable to all W3C Working Group members as well as 
the testers of implementations."

makes your test guidelines to be significantly more important than they 
are, and is frankly false.

"Working Groups who may already be doing some of these activities should 
review the document and incorporate the principles and guidelines into 
their test materials as much as possible."
I have no intention of following this suggestion. It should be toned down.

at least to "... as much as they see fit."


1.3
The two paragraphs are in tension with one another.
The Semantic Web groups have followed the first paragraph but not the second.

We have explicitly decided not to produce conformance test suites, but have 
endorsed test cases as a means for enhancing the quality of our documents.

It is a shame that your work has only explored the second of these two 
paragraphs.

1.4
"The guidelines are intended for all Working Groups "
Does the QA WG follow these guidelines?
I doubt it.
Neither does the RDF Core WG nor the WebONt WG.
I suggest you reduce the ambition of this document.

1.5

I think you should not reference WAI in this section, maybe the 
acknowledgements.
I found it distracting, and had to go away and look at the WAI stuff, when 
in fact it was irrelevant, and merely of historical interest.

1.6
"Satisfying these checkpoints is a basic requirement to ensure quality and 
interoperability of the standard. "

Satisfying these checkpoints are neither necessary nor sufficient for the 
quality and interoperability of the standard. This untrue statement needs 
to be made less ambitious.

Checkpoint 1.1
MUST

who must do this?
A test suite sits on a hard disk somewhere. It does not have the 
capabilites to respond to this command. Suggest replace "MUST define" by 
"defines"
cf Connolly's must is for agents
http://www.w3.org/2001/01/mp23

Checkpoint 1.2

How wholly unrealistic.
Any W3C draft depends on IP, and TCP, and HTTP, ...
There is dependency on the laws of mathematics ...

This checkpoint is a red herring.
When lowerlevels specs are unclear, the WG may or may not be aware of that, 
and may or may not be able to clarify appropriate behaviour in their spec, 
which may or may not be reflected in the test suite.

It is partly your obsession with testing conformance of implementations 
than using the test material to ensure high quality recs that is at fault here.

Obviously if I have an OWL implementation that uses an inadequate IP stack 
it won't work; and from a conformance testing point of view that might be 
relevant. From the point of view of defining OWL it is not interesting.

Checkpoint 1.3.
...
must be ... documented

Yawn. Requiring documentation is the instrument of mediocrity.
Obviously it is helpful when a test has gone wrong to know which part of 
the spec it relates to, but that is nearly always self explanatory (the 
test has the foobar element in it, look at the definition of the foobar 
element in the spec) or it is not meaningful to document it. The 
patel-schneider paradox arises from a complex interaction between much of 
the spec.
This checkpoint mandates a waterfall model for test development, rather 
than an interactive rapid prototyping model which is what is actually used 
by WGs who publish WDs every three months as mandated by W3C process.


Checkpoint 2.2.

Dull, dull, dull. This won't get done, and it is stupid to pretend that it 
will.

Guideline 3.

This stuff about a test management system and test framework is really 
quite unclear.

It seems that you have in mind a piece of software, whereas a test suite 
is, in my mind at least, a set of declarative statements. Particularly when 
we are talking about document formats (which is the typical case for W3C).

Many of the guidelines about a test management system are met by the OWL 
Test Cases simply by being a well structured document with a website behind it.

For example, you can use grep to look though the single html file version 
of the OWL test cases to satisfy checkpoint 3.3.
(This can be met in more sophisticated ways, but it is the WGs job to 
provide the metadata now to provide a software tool to manipulate it).

Checkpoint 3.4 would be better stated by requiring every test to have a 
unique URI


Checkpoint 4.1 is wrong and should be deleted. This is entirely out of 
scope for a W3C WG.

The rationale for checkpoint 5.3 is flawed.
Implementators need to know their tests pass or fail, it is not part of a 
test suite, certainly not at priority 1, to help them fix the problems when 
the tests fail.

Checkpoint 6.1
Why are you obsessed with writing documents.
Who cares whether a plan was documented what matters is whether vendors 
were engaged or not.
They will be engaged if:
- the tests help them develop products
- the products can be sold

any other activity to achieve the goal here is makework.

 
I am sorry that I haven't had a good word to say about this.
I guess I should point out that this opinionated diatribe is my own, and 
not endorsed by HP or by any of the WGs I am in.

Jeremy
Received on Tuesday, 1 July 2003 12:15:33 UTC