Re: Review of Test Guidelines from Karl Dubost on 2003-07-15 (www-qa@w3.org from July 2003)

From: Karl Dubost <karl@w3.org>
Date: Tue, 15 Jul 2003 15:51:19 -0400
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-qa@w3.org
Message-Id: <a06001819bb39f340f87c@[10.0.1.2]>
Hi Jeremy,


Let's try to nail down that a bit :)

At 17:13 +0100 2003-07-01, Jeremy Carroll wrote:
>I did not find this document helpful.
>I do not much believe this review will be helpful either.

Your comments will be helpful for us at least. The fact is that if 
every people say that it will not be useful, we will not be able to 
produce a *useful* document for WGs. And if it's not useful, it will 
not be used.

So thank you again.

>I also felt that your document missed insight into how quality can 
>be assured, and how the quality of the Test work of a WG can be 
>enhanced.

:) It might be possible but we will try to explain why the document 
has its shape. The Test Guidelines Document in the QA Framework is 
the youngest, it has only a few months and it's really taking shape 
now. It's also why we often don't do strict review of Test materials 
inside WG for now against the Test Guidelines, because the document 
is still in its infancy.

>Specific issues which were not addressed were:
>
>
>a) how to effectively integrate the test work with the rest of the 
>work of the WG. In particular your documents appear to follow a 
>waterfall model of specification development:
>- a specification is written
>- testable assertions are added to it
>- metadata is added to those testable assertions
>- tests are developed for those testable assertions
>etc. etc.
>
>Each step follows the previous step.

No. We do not recommend that, but maybe the Guidelines are not well 
written enough and we will have to improve because it seems you have 
misunderstood them.

* QA Framework Operational guidelines
- In Motivation and expected benefits
http://www.w3.org/TR/2003/WD-qaframe-ops-20030210/#motivation
	"The principal factor for improving the quality
	 of implementation is early availability of
	conformance test materials."

	"This makes sense, since it is natural for test suites
	and implementations to develop in parallel - each is
	a help to the development of the other. "

and
	"Moreover, many Working Groups have already established
  	procedures, techniques and tools for developing test
	materials (e.g., Document Object Model - DOM). It
	makes sense to capitalize on what has already been
	done and share that with those who are starting out
	and those who are already in the process of
	developing conformance materials."


I added this one because you make a comment that the Test Guidelines 
didn't reflect the way the W3C is working. We tried to gather all the 
past successful experiences together, which means:
	- SVG work
	- DOM work
	- CSS work
	- UAAG work


and still in Ops Guideline

Guideline 1
	"The quality of specifications and the early availability
	of conformance test materials (TM) are significant
	contributors to the quality of implementations."

and

Guideline 3
	"The benefits of starting synchronization of the
	specification and test materials development as
	***early as possible*** include:

    	* it gives an extensive list of use cases;
    	* it identifies ambiguities in the specification
	at the ***early stages***, which saves cycles in
	the late phases;
    	* it provides clear set of testable assertions --
	skeleton of the specification -- which in its turn
	facilitates development of interoperable implementations.
	The latter is a W3C process criterion for entering the
	Candidate Recommendation phase."
and

	"Chekpoint 3.1 Synchronize the publication of
	QA deliverables and the specification's drafts. [Priority 2]

	Rationale. Because each version of the specification
	-- WDs, CR(s), PR, etc -- is a changed document,
	therefore all important dependencies such as test
	materials need to be updated concurrently."

	"Examples of QA deliverables might range from a
	TS (test suite) production schedule in early WDs,
	 to TS design document in later WDs, to a first public
	TS release at CR."

and many other citations in the document...

*** We do encourage people to start to make test at the start *** not 
a waterfall model at all.

BUT we do know also how some WGs are organized and if we only propose 
one model (for example XP) we will fall in the caveats where people 
will find it constraining for their own WGs.

So we are trying to be a framework with variable geometry to be 
suitable to many kind of technologies.


>A more exciting model of test is one which follows the extreme 
>programming model (XP).

A prelimary comment about XP programming. Some of the people who have 
participated to the building of the SVG Test Suite are early adopter 
of XP Programming and they used often this techniques. So XP 
Programming is not an unknown for the QA WG. It's even I would say a 
way to work.

For example when we are writing a guideline or a checkpoint we are 
trying to test it against the specs we are reading AND against the 
Guideline itself, which makes it quite complicated but show issues.


>In particular, a WG usually has some sort of document as input(e.g. 
>a member submission or a previous version of the spec). A test 
>driven spec development would use tests that ask questions that the 
>input document does not answer unambiguously. For example, much of 
>the interest in the WebOnt work arose from the Patel-Schneider 
>paradox [1], which is essentially a test case on the DAML+OIL member 
>submission [2] that was the key input to webont.

Nothing to say about that, because I think all the people in the QA 
WG agree with you.

>WGs that are not following a waterfall model will not be able to 
>participate in your guidelines..

that's not true. Read above. I think you are mixing two concepts or even three.

	The QA Process		-> Ops Guidelines
	The Quality of the Spec -> Spec Guidelines
	The Quality of the Test -> Test Guidelines

Organization like an XP model is possible, Cf Ops Guidelines because 
it's the QA Process itself. Nobody forbids you to organize your work 
like that, we even encourage it as I showed you before.

>b) Ensuring that quality people develop the tests
>
>ISO 9000 style quality is an attempt to get mediocre, unmotivated 
>people to do OK work of a predictable (but low) quality because some 
>power structure (typically connected with money and wage-slavery) 
>sits above them, and forces them to jump through some hoops. As far 
>as I can tell, the test guidelines follow this paradigm. Here is a 
>checklist, and even a not particularly competent WG might get 
>adequate results by following these guidelines.
>
>W3C WGs depend on volunteers over whom the W3C has little actual power.
>Hence using a methodology that depends on the ability to punish and 
>reward is futile.
>
>Moreover, web standards have depended in part on getting brilliant 
>people to participate. It is an elitist exercise.
>This explains for example, the director's dictatorial powers.
>
>Test work is often boring, particularly when placed in a waterfall 
>style model, and hence does not engage the minds of brilliant 
>people. Strengthening the process for test development is 
>counterproductive because it will make test more boring and reduce 
>the attractiveness to brilliant people.
>
>It is important to identify what motivates the best people to 
>participate in the W3C and what demotivates them. It is at least 
>plausible that peer group acclaim is important, and hence it is 
>important that WG members who contribute to the test work are 
>adequately acknowledged.
>
>This suggests a priority 1 guideline that there should be a 
>recommendation track document for the test work in a WG. A good 
>choice of editor who takes pride in their work, will be one of the 
>best ways to ensure a deep quality to the test work.

I do not agree with that. I'm an Editor of a small document too. 
Being an editor in the Rec Track doesn't give benefits at all but 
trouble. When people know about the tasks, they run away.

As a counter example Daniel Glazman and Tantek Celik are well known 
for their strong participation in the CSS Test Suite and it's not in 
the Rec Track.

I do agree that the work must be rewarded in a way, but the Rec Track 
has nothing to do with that. Do you remember the name of the editors 
of HTML 4.01 or/and HTML 3.2 out of the blue just like that ?


>It may be appropriate that each test is individually attributed.

Agreed. In the Ops Guidelines we don't have a recommended format for 
the test, but we recommend to Plan test materials development 
(Guideline 5). We ask to define a contribution process. It's up to 
the working group to defien how each test must be marked and with 
each information.

The CSS WG has written a guide for the test
http://www.w3.org/Style/CSS/Test/testsuitedocumentation.html
http://www.w3.org/Style/CSS/Test/guidelines.html

The WG can decide which information must be inside a Test.


>This relates to the previous point in that if
>we see tests as driving the development process rather than 
>following, it is more likely that good people in the WG will engage 
>in that process.
>
>As it is, the waterfall model you suggest looks like makework for 
>mediocre members of the WG
>
>Of course, process guidelines are useful for brilliant people, but 
>they need to have a light touch, and be life enhancing rather than 
>deadening.

XP Programming is a process and you can write Guidelines that are 
tied to the method. The fact is that the QA Framework doesn't forbid 
you to use the XP Method to achieve your test suite and/or your spec.


>c) timeliness of tests
>
>While in an ideal world recs would wait for a test suite to be 
>developed, and there are adequate volunteers from within the WG to 
>develop the tests, this may well not be the case.
>
>It is important to prioritise which tests are the most useful, and 
>which are less useful. The issue driven test process, in which test 
>cases often form part of an issue resolution, used in particular by 
>RDF Core WG; is an excellent way of doing this.
>
>d) cost effectiveness of tests
>
>A W3C WG has limited resources.

Yes and it's often why some WG choose to develop, *unfortunately* the 
test suite after the Spec.

>Developing conformance test suites may be a waste of them.

I think you put a definition in Conformance that is too strong. Do 
you think Certification when you read conformance ? Because it's not 
the case at all.


>It is necessary for a WG to develop a clear specification, many of 
>the tests necessary for conformance testing are obvious. The tests 
>that are less obvious will be those that come up in an issue driven 
>process of specification development, and these should command the 
>majority of WG resource.
>
>[1]
>http://lists.w3.org/Archives/Public/www-webont-wg/2002Jan/0099
>
>[2]
>http://www.daml.org/2001/03/daml+oil-index.html
>
>
>Detailed review comments:

This document ?
http://www.w3.org/TR/2003/WD-qaframe-test-20030516/

>1.2
>
>You scope the guidelines to conformance tests - however there are 
>other reasons why WG may wish to develop tests.
>
>In particular both RDF Core and WebOnt WGs have had issue driven 
>test processes, where the proposed tests help clarify different 
>positions between members of the WG, and the approved tests clarify 
>the issue resolution. Parts of the specs that had no related issues 
>are typically unproblematic, and building tests for those parts is 
>less critical, and less cost effective.
>
>Hence the phrase:
>"However they are applicable to all W3C Working Group members as 
>well as the testers of implementations."
>
>makes your test guidelines to be significantly more important than 
>they are, and is frankly false.


Read in the context of the whole QA Framework.

>"Working Groups who may already be doing some of these activities 
>should review the document and incorporate the principles and 
>guidelines into their test materials as much as possible."
>I have no intention of following this suggestion. It should be toned down.
>
>at least to "... as much as they see fit."

I think the problem is the way you read the Test Guidelines (Early 
Stage of the spec).
Often we had the same comments for Spec Guidelines and people were 
saying it doesn't apply to us. And when I have checked with them, 
they were already conforming to most of the points and a few minors 
modifications was making them fully compliant.
It's a question of interpretation and learning curve.

We do agree that our wording is sometimes difficult. We try to make 
our best to make it clearer. It's why it's very important to have a 
QA contact inside your WG, because it helps to do that work of 
interpretation and to explain it in terms the WG knows, because each 
social community and network have their own language.

When I have read Ontologies Framework I had a very hard time to 
understand it, because of the presentation and the language used and 
the topic sometimes. It's normal.

>1.3
>The two paragraphs are in tension with one another.
>The Semantic Web groups have followed the first paragraph but not the second.
>
>We have explicitly decided not to produce conformance test suites, 
>but have endorsed test cases as a means for enhancing the quality of 
>our documents.

Explain what a conformance Test Suite is for you?

>It is a shame that your work has only explored the second of these 
>two paragraphs.
>
>1.4
>"The guidelines are intended for all Working Groups "
>Does the QA WG follow these guidelines?
>I doubt it.
>Neither does the RDF Core WG nor the WebONt WG.
>I suggest you reduce the ambition of this document.

We are doing and we try to do. It's one of the most important test we 
make "Eat your own dog food".
I explained it before.

>1.5
>
>I think you should not reference WAI in this section, maybe the 
>acknowledgements.
>I found it distracting, and had to go away and look at the WAI 
>stuff, when in fact it was irrelevant, and merely of historical 
>interest.


It's an example and an illustration to explain how it's organized. 
Your issue will be certainly recorded.

>1.6
>"Satisfying these checkpoints is a basic requirement to ensure 
>quality and interoperability of the standard. "
>
>Satisfying these checkpoints are neither necessary nor sufficient 
>for the quality and interoperability of the standard. This untrue 
>statement needs to be made less ambitious.

Where is the truth?
Experience? Theory? If I follow your rationale since the start. You 
advocate for XP Programming, XP is all about practical and realistic 
experience. The guidelines have been written by looking at what 
people have done in the past at W3C (DOM, SVG, XML, CSS, OWL, etc). 
Priority 1 is the minimum we ask.


>Checkpoint 1.1
>MUST
>
>who must do this?
>A test suite sits on a hard disk somewhere. It does not have the 
>capabilites to respond to this command. Suggest replace "MUST 
>define" by "defines"
>cf Connolly's must is for agents
>http://www.w3.org/2001/01/mp23

The sentence has to be reformulated.


>Checkpoint 1.2
>
>How wholly unrealistic.
>Any W3C draft depends on IP, and TCP, and HTTP, ...
>There is dependency on the laws of mathematics ...
>
>This checkpoint is a red herring.
>When lowerlevels specs are unclear, the WG may or may not be aware 
>of that, and may or may not be able to clarify appropriate behaviour 
>in their spec, which may or may not be reflected in the test suite.
>
>It is partly your obsession with testing conformance of 
>implementations than using the test material to ensure high quality 
>recs that is at fault here.
>
>Obviously if I have an OWL implementation that uses an inadequate IP 
>stack it won't work; and from a conformance testing point of view 
>that might be relevant. From the point of view of defining OWL it is 
>not interesting.

We are talking about immediate dependencies with regards to the technology.
For example HTML and Accessibility are very dependant, they are 
interelated technologies


>Checkpoint 1.3.
>...
>must be ... documented
>
>Yawn. Requiring documentation is the instrument of mediocrity.
>Obviously it is helpful when a test has gone wrong to know which 
>part of the spec it relates to, but that is nearly always self 
>explanatory (the test has the foobar element in it, look at the 
>definition of the foobar element in the spec) or it is not 
>meaningful to document it. The patel-schneider paradox arises from a 
>complex interaction between much of the spec.
>This checkpoint mandates a waterfall model for test development, 
>rather than an interactive rapid prototyping model which is what is 
>actually used by WGs who publish WDs every three months as mandated 
>by W3C process.

Nobody forbids you to have a test in development, to test something, 
to write the spec, to test it again, and to publish the test as 
stable with regards to the spec. Chicken and eggs problem. Nobody 
predates the other one. Both are working together.

>
>Checkpoint 2.2.
>
>Dull, dull, dull. This won't get done, and it is stupid to pretend 
>that it will.

So do you prefer to make test without metadata, making it difficult 
to use by anyone except the tester who has designed it and his 
immediate proximity?

>Guideline 3.
>
>This stuff about a test management system and test framework is 
>really quite unclear.
>
>It seems that you have in mind a piece of software, whereas a test 
>suite is, in my mind at least, a set of declarative statements. 
>Particularly when we are talking about document formats (which is 
>the typical case for W3C).
>
>Many of the guidelines about a test management system are met by the 
>OWL Test Cases simply by being a well structured document with a 
>website behind it.

bizarre it seems you have a Test management system ???
http://www.w3.org/2002/03owlt/Makefile


>For example, you can use grep to look though the single html file 
>version of the OWL test cases to satisfy checkpoint 3.3.
>(This can be met in more sophisticated ways, but it is the WGs job 
>to provide the metadata now to provide a software tool to manipulate 
>it).
>
>Checkpoint 3.4 would be better stated by requiring every test to 
>have a unique URI

? Checkpoint 3.4. Test management system must support results. [Priority 2]
Conformance requirements: The test management system must allow for 
the results of a test case to be associated with the test.

It's about test Results


>
>Checkpoint 4.1 is wrong and should be deleted. This is entirely out 
>of scope for a W3C WG.

?

>The rationale for checkpoint 5.3 is flawed.
>Implementators need to know their tests pass or fail, it is not part 
>of a test suite, certainly not at priority 1, to help them fix the 
>problems when the tests fail.

	It's not part of the TS as you said and as it is written in 
the guidelines.
	CP 5.3 ***Results reporting framework*** must indicate result 
status of each test.



>Checkpoint 6.1
>Why are you obsessed with writing documents.

It's like Spec at W3C. They are not necessary to make the technology 
working, but they help a lot when other people wants to understand 
the technology and implement it.

>Who cares whether a plan was documented what matters is whether 
>vendors were engaged or not.
>They will be engaged if:
>- the tests help them develop products
>- the products can be sold
>
>any other activity to achieve the goal here is makework.
>
>
>I am sorry that I haven't had a good word to say about this.
>I guess I should point out that this opinionated diatribe is my own, 
>and not endorsed by HP or by any of the WGs I am in.

:))) I think you do a valuable work inside OWL WG, the OWL Test Suite 
is impressive, and I'm pretty sure even if you don't think it's 
possible you were already complying to many point of the guidelines.

In a dialog with a group, there are always misunderstandings, which 
is natural. For example, Sandro explain what I have missed for my 
review of OWL against Spec Guidelines and TOGETHER, we have found 
solutions.

Jeremy, we are in the same ship, W3C, it's why we are all trying to 
improve our work. As you said it's better to encourage and drive a 
better development than to dump it. :)

I hope the next versions of the Test Suite Guidelines will address 
some of your concerns, and that some of your questions have been 
solved.

Best Regards.



-- 
Karl Dubost / W3C - Conformance Manager
           http://www.w3.org/QA/

      --- Be Strict To Be Cool! ---
Received on Tuesday, 15 July 2003 15:54:58 UTC