draft personal review of QA stuff from Jeremy Carroll on 2003-12-18 (www-webont-wg@w3.org from December 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 18 Dec 2003 16:38:06 +0100
To: <www-webont-wg@w3.org>
Message-ID: <BHEGLCKMOHGLGNOKPGHDGEHNCCAA.jjc@hpl.hp.com>
Evan and I had the relatively narrow task of looking at the QA-OPS
guidelines ... I feel a desire to comment on the QA framework as a whole,
and intend to do so as a personal comment to be sent after the webont
comment (assuming we agree to something like what we have).

I thought it might be helpful to let WebOnt see my current thoughts ... we
could, for example, decide that some points were so important that they
needed to be made by the WG - however I think the review prepared by Evan
and myself is sufficient.

(Unlikely ... I really am much too extreme in my opposition to this stuff -
I see that a repeating theme is that they commit to AAA quality in their
charter but seem remarkably short on fulfilment - and I get quite angry
about this armchair quality work, which is itself too mediocre)

I guess I will need to make it very clear that this is a personal comment,
and not on behalf of anyone else (e.g. HP or a WG).

===

This is a review of the following documents:

http://www.w3.org/TR/2003/NOTE-qaframe-intro-20030912/ (WG NOTE)
http://www.w3.org/TR/2003/CR-qaframe-ops-20030922/ (CR)
http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/ (CR)
http://www.w3.org/QA/WG/2003/10/TestGL-20031020.html (editors draft)

also mentioning

http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/qaframe-spec-ta
http://www.w3.org/QA/WG/2003/09/qaframe-spec-extech-20030912

I note that I have done most on the test editors draft - which may have been
inappropriate timing, but hope it helps anyway.

===

Comment 1: editorial - ToC
====================
http://www.w3.org/TR/2003/NOTE-qaframe-intro-20030912/
does not have a ToC - please fix.

Comment 2: substantive - Goals
======================
Goals etc.
http://www.w3.org/TR/2003/NOTE-qaframe-intro-20030912/
section 1.1

I believe the goal should be better quality recommendations.
I believe test suites may contribute to this, but in terms of scoping the QA
work, and in terms of setting the goals of the QA work this should be linked
to the output of W3C which is a set of documents.

Thus test suites are only useful in as much as they help provide better
quality recommendations.

This of course begs the question as to what are the quality metrics for a
recommendation - suggestions:
- precision
- distinction between normative and informative force
- consistency, with self, with other W3C publications
- implementability
- readability (i.e. a precise document such as OWL S&AS by itself fails, but
in combination with other material may meet this goal)

The problem with setting conformance tests as the goal is that many WG
members will not be committed to this goal

In more detail
[[
 For those undertaking quality assurance projects, the QA Framework should
give:

+ a step-by-step guide to QA process and operational setup, as well as to
development;
+ and, access to resources that the QA Activity makes available to assist
the WGs in all aspects of their quality practices.

** neither of these points say very much, since they depend on the
definition of QA, which most readers will not share, suggest delete **

More specific goals include:

+ to encourage the employment of quality practices in the development of the
specifications from their inception through their deployment;
** again without a shared understand of quality this statement is vacuous,
no-one is opposed to 'quality', but some will be opposed to the QA WG's
conceptualization of quality **
+ to encourage development and use of a common, shared set of tools and
methods for building test materials;
to foster a common look-and-feel for test suites, validation tools,
harnesses, and results reporting.
** this is the first point which it is possible to disagree with, and hence
is the first substantive statement of goals - and it is inappropriate - the
goal must be higher level than this **
]]

A problem with having conformance tests as a goal is that it is unrealistic
to expect the whole of the WG to buy into it. Whereas (nearly) all (active)
WG members will accept that the quality of the documents being produced is a
reasonable goal for the WG. Quality is not the responsibility of a
specialist subgroup but a shared responsibility of the whole WG - obviously
different members of the WG will have different beliefs and opinions as to
the value of testing, and will only really support test work once it has
begun to show real benefit on more general measures.


Comment 3 - AAA, eh?
=========
The QA WG is committed in its charter to AAA quality on all the metrics.
It does not appear in your CR that you achieve this, and this alone is
reason for the QA CR to not go forward. Examples where you fail on your own
metrics follow.

Comment 4 - Synchronize
=========
http://www.w3.org/TR/2003/CR-qaframe-ops-20030922/guidelines-chapter#Gd-sync
-spec-TM-devt
[[
Checkpoint 3.1. Synchronize the publication of QA deliverables and the
specification's drafts. [Priority 2]
]]
while I support the WebOnt WG's more general comment about your checkpoints
being too strong, I also suggest that this one is too weak.
It is hard to review specifications that are released in pieces with
references from one part at one level of completeness to another at another
level.
I find your documents a very good example of how not to release LC and CRs.
It is too difficult for the reader to make any sort of consistent cut
through what is in reality a single logical publication, but which has
pieces at very different levels.
I also find that WG notes for the informative sections work less well than
using recommendations for the informative sections. Partly this is because
you seem to have allowed yourself lower editorial standards in the notes
(e.g. the missing ToC), partly that a WG note is not necessarily a consensus
document, partly because the lack of LC, CR, PR checkpoints in a note make
it difficult for the reader to understand what sort of review is
appropriate.

I find no evidence that you fulfilled this checkpoint in your own
publications.
In particular I have not found test material for QAF-OPS, and the test
material I have found for QAF-SPEC is too incomplete for CR,
(the main content seems to be
http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/qaframe-spec-ta
which is test assertions rather than tests)

So I suggest that this checkpoint be reworked as:

1) During the latter review stages of the recommendation track (LC, CR, PR,
Rec) it is important for WGs to appreciate the difficulty that
non-synchronous publication of all relevant material causes to reviewers.
2) Informative content that the WG believes many readers will need in order
to  fully understand the normative content of a recommendation should be in
recommendation track documents.
3) WG notes should be used for additional optional material.
4) test material should be published at the same time as the main documents
5) When synchronized publication is not possible, then the earlier
publications should indicate the intended date of publication of the later
documents. The review period should extend an appropriate date after the
publication of the last documents published. In particular no one part of a
package of related documents can move ahead more than half-a-step of other
parts.

Comment 5 - appropriate linking to Tests
=========

I have no idea how to tell whether or not there are tests for your CR
documents.
I have to resort to google.

RDF Core and WebOnt both decided to publish their tests as a rec track doc.

It would be interesting to see the QA's group view of this.

Here are some advantages of that decision:
- clear the level of synchronization or not between test and other WG work.
- more obvious where to find the tests - test publication is announced using
standard procedures
- test work is recognised as an important part of WG activity with public
credit given to test editors (although some test contributors are
undervalued)
- test work is preserved for posterity using W3C's preexisting publishing
process

Here are some disadvantages
- more difficult/impossible to add/modify tests after Rec
- not clear best way to incorporate tests in a document
  - RDF Core just use a zip which ends up as the normative copy
  - WebOnt include tests inline, so test document is enormous (XXL)


In any case the QA documents should suggest that rec track documents have
clear and straightforward links to the relevant test suites.


Comment 6: "scope"
==================

This is a banality
http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/qaframe-spec-ta
Checkpoint 1.1
says
[[
The first section of the specification contains a clause entitled "scope"
AND enumerates the subject matter of the specification.
]]

a) conventionally W3C specs up case the first letter of a section title.
b) the document
http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/
does not satisfy this (wording of the) checkpoint
c) the document
http://www.w3.org/TR/2003/CR-qaframe-ops-20030922/
does not satisfy this (wording of the) checkpoint

The sloppiness of the wording is indicative of the lack of quality in the
family of documents.
Problems with the sentence include:

a) "first" is too strong (cf "1.2. Scope and goals")
The scope should be stated in the introductory material.

b) "section" is too strong (cf "1.2. Scope and goals")
(well, when looking at the word "entitled" - clauses do not have titles,
sections and subsections do)

c) "specification" is incorrect (cf. "QA Framework: Operational Guidelines
W3C Candidate **Recommendation**" )

d) "clause" is undefined and is not generally used in the discussion of W3C
recommendations

e) "entitled" can only plausibly apply to certain xhtml elements, it is not
clear that these are what you have in mind.


f) "scope" is too narrow - surely what matters here is the intent not the
actual words used.

g) "AND" emphasis unnecessary

h) "enumerates"  numbering the parts is for the ToC

For example, I find that
http://www.w3.org/TR/2003/PR-owl-semantics-20031215/#1
adequately quickly and concisely describes what the document is about and
why I might or might not read it, and what I might look at instead.
Any reworking of this checkpoint should be liberal enough to permit the OWL
Semantics PR document to pass it, since that document is of adequate quality
on this metric.

I am unconvinced that this family of documents have had adequate review by
the QA WG, and the QA IG. I suggest that you should set yourselves higher
goals before seeking wider review again.


Comment 7: throughout s/Specification/Recommendation/g
==========

W3C publishes recs not specs.

Comment 8:
==========
"the Working Group MUST identify a person to manage the WG's quality
practices. "

I cannot tell whether the QAWG have fulfilled this requirement or not. I
suggest that the discussion should suggest that the QA moderator be listed
on the WG home page. (I bet you haven't - aren't I a a cynic?)

I am unclear as to the value of this. The problem is to do with rewards,
motivations and power.

Rewards and Motivations
=======================
If I get appointed as QA moderator I get a nice new entry on my CV, but what
real interest do I have in ensuring the WG produces quality documents. The
editors get their names on the docs, not me. I note that RDF Core appointed
Brian McBride as Series Editor - he did a huge amount of work which largely
was driven by quality goals such as consistency across the docs, consistency
between the tests and the docs etc. and he is justly rewarded by a fairly
big splash of his name on the W3C recs (hopefully).

If I were a QA moderator - I am not getting paid for this job, W3C work is
voluntary and has to compete with other tasks my boss might think of as
worthwhile, what is my recompense? If the job is little work then this is
not a problem but the checkpoint is unmotivated.
(I note that WG chairs have the same problem - basically their self-interest
is to get to rec with the minimal amount of effort)


Power
=====
Lets suppose I am a consciencious QA moderator and the WG repeatedly makes
decisions that undermine quality. My only initial power is to (threaten to)
resign. As such this checkpoint needs to be integrated into the process
document and for it to be clear how things escalate after a QA moderator as
taken that ultimate step. Obviously the WG can/should appoint someone else,
and this must be done in a timely way. How does one avoid a stooge? How does
one avoid a lip-service to QA?
It seems to me that quality work must clearly justify itself. i.e. that the
work done by the QA moderator should be of such obvious value to the WG that
he/she will gain a respect within the group that enables a certain power
within the group.
If this is the case then the title is unnecessary. I don't think this
checkpoint is though through and I think the QA work would do well enough
without it. A Wg has document deliverables and test deliverables: the owners
of these deliverables need to own the quality process for them.


Comment 9: ???
==================================

This is very confusing ...
is this still part of your framework?
http://www.w3.org/QA/WG/2003/02/OpsET-qapd-20030217
is there a later version?
is it dead?

Problems:
1) some docs in WG space are part of the recommendation and not merely
editors drafts e.g.
http://www.w3.org/QA/WG/2003/09/qaframe-ops-extech-20030912

2) other documents in the WG space are defunct are irrelevant in one way or
another.

3) specifically
http://www.w3.org/QA/WG/2003/02/OpsET-qapd-20030217
claims to be an appendix to
http://www.w3.org/QA/WG/2003/02/qaframe-ops-extech-20030217
which claims to be an earlier version of
http://www.w3.org/QA/WG/2003/09/qaframe-ops-extech-20030912

4) however
http://www.w3.org/QA/WG/2003/02/qaframe-ops-extech-20030217
does not list
http://www.w3.org/QA/WG/2003/02/OpsET-qapd-20030217
in the ToC

are there it is ... in a paragraph underneath the ToC - ummm a very "high
quality" ToC.


Comment 10
==========

http://www.w3.org/QA/WG/2003/09/OpsET-qapd-20030912

http://www.google.com/custom?hl=en&lr=&ie=ISO-8859-1&cof=AWFID%3A0b9847e42ca
f283e%3BL%3Ahttp%3A%2F%2Fwww.w3.org%2FIcons%2Fw3c_home%3BLH%3A48%3BLW%3A72%3
BBGC%3Awhite%3BT%3Ablack%3BLC%3A%23000099%3BVLC%3A%23660066%3BALC%3A%23ff330
0%3BAH%3Aleft%3B&domains=www.w3.org&q=%22QA+Test+Material+Process+Document+f
or+QA%22&btnG=Google+Search&sitesearch=www.w3.org
no hits


http://www.google.com/custom?hl=en&lr=&ie=ISO-8859-1&cof=AWFID%3A0b9847e42ca
f283e%3BL%3Ahttp%3A%2F%2Fwww.w3.org%2FIcons%2Fw3c_home%3BLH%3A48%3BLW%3A72%3
BBGC%3Awhite%3BT%3Ablack%3BLC%3A%23000099%3BVLC%3A%23660066%3BALC%3A%23ff330
0%3BAH%3Aleft%3B&domains=www.w3.org&q=%22QA+Test+Material+Process+Document+f
or+Quality+Assurance%22&sitesearch=www.w3.org

no hits

http://www.google.com/custom?hl=en&lr=&ie=ISO-8859-1&cof=AWFID%3A0b9847e42ca
f283e%3BL%3Ahttp%3A%2F%2Fwww.w3.org%2FIcons%2Fw3c_home%3BLH%3A48%3BLW%3A72%3
BBGC%3Awhite%3BT%3Ablack%3BLC%3A%23000099%3BVLC%3A%23660066%3BALC%3A%23ff330
0%3BAH%3Aleft%3B&domains=www.w3.org&q=%22QA+Test+Material+Process+Document+f
or+QAWG%22&btnG=Google+Search&sitesearch=www.w3.org

no hits

where is your QA Test Material Process Document, AAA quality WG?

In a way this whole thing looks like a sick joke in which you invent
unnecessary work for others which you are not prepared to do yourselves.
More politely you moved to last call prematurely, (let alone CR)


Comment 11
==========
http://www.w3.org/QA/WG/2003/10/TestGL-20031020.html
[[
Guideline 1. Perform a functional analysis of the specification and
determine the testing strategy to be used.
In order to determine the testing strategy or strategies to be used, a
high-level analysis of the structure of the specification (the subject of
the test suite) must be performed. The better the initial analysis, the
clearer the testing strategy will be.

]]

(umm perhaps your guidelines should have a letter before so it is clear
which document they come from e.g. Guideline T.1)

Neither WebOnt nor RDFCore did this.

It is hard since the main purpose of the tests for these WG were to help in
the development of a quality recommendation, and one cannot do a final
functional analysis of the rec until its basically finished, which would
have overly committed us to a waterfall model of development.
In fact, that motivcation indicates that the second sentence quoted is too
strong there is no "must be performed" here, suggest "may be helpful".

Having said that, it is clear that the coverage of the tests in both the SW
WGs is weaker than it would have been if we had followed this guideline at
some point, this then comes back to issues to do with synchronization and
timelines etc. In WebOnt I am reasonable sure that most of the untested bits
are from that part of the rec that is fairly easy to implement. Thus, since
we do not have a conformance test suite, the many OWL implementations that
pass all the tests may nevertheless have a variety of trivial errors that
prevent interoperability. I don't see that as the responsibility of the WG -
conformance tests come later, and at that point (or in bug reports to
software developers) it will become clear what trivial errors in software
need fixing. Of course, in a very few cases these trivial errors may point
to minor errors in the spec where there is insufficient clarity - but I
believe that issue driven test development has covered almost all of these
areas adequately.

[[
Checkpoint 1.3. Analyze the structure of the specification, partition it as
appropriate, and determine and document the testing approach to be used for
each partition. [Priority 1]
]]
Suggestions:
a) weaken this to have "may" force rather than "must" force.
b) Use RFC 2119 keywords.

Comment 11
==========
[[
Checkpoint 2.1. Identify and list testable assertions [Priority 2]
Conformance requirements: Test assertions within or derived from the
specification must be identified and documented.

Checkpoint 2.2. Tag assertions with essential metadata [Priority 1]

Rationale: It must be possible to uniquely identify assertions, and to map
them to a particular location, or to particular text, within the
specification.
Wildly oversimplistic.
]]

Even the simplest OWL test relies on many parts of the recommendation. The
idea that it is possible to tie a test to one or two parts of the
recommendation is philosophical flawed (similar to the concept of causation,
cf a huge body of literature). I do not believe this is uniquely a property
of OWL.

Obviously one tries to structure the tests in such a way that assuming a
system passes some set of easier tests, then this new test presents an
interesting challenge, but ... Of course this also amounts to the issue that
you lot seem to believe that it is possible to test for conformance whereas
that is trivially incorrect. (Given any set of conformance tests for any
system where each test is characterise as one more inputs resulting in one
or more outputs, the piece of software that is defined to precisely pass the
test suite, by giving the determined output for the determined input, and
otherwise to fail horribly, is a non-conformant piece of software that
passes the conformance tests).

Suggest drop these requirements, and the related ones in SpecGL.

Possibly weaken to a
"It may be helpful to list the test assertions found within or derived from
a recommendation"

Comment 12:
[[ (Test 3.2)
When the Working Group requests test submissions, it must also request that
the appropriate metadata be supplied.
]]
I found it easier to completely own the test metadata in webont (well me and
Jos the co-editor). Unfortunately the metadata quality is crucial and is
best ensured by having a fairly small number of people responsible - sure
it's a lot of work.

The *must* is too strong, suggest *may*.

The list of test metadata omits "the type of the test" and "the files
associated with the test"

Comment 12
==========

[[
Conformance requirement: The test materials management process must provide
coverage data. At a minimum, the percentage of assertions for which at least
one test-case exists should be calculated and published.
]]

Makework - this statistic is useless why the **** do you want to waste other
people's time in calculating it.
Any test suite tests 0% of any plausible language worth specifying because
the language is infinite and the test suite is finite. Any other number is
simply a fib.

Suggest drop this requirement and any related requirement.

Comment 13 issue tracking is not a test issue
==========
[[
Checkpoint 3.4 Provide an issue-tracking system [Priority 2]
Conformance requirements: The test materials management process must include
an issue-tracking system.

Rationale: If a high-quality test suite is to developed it is important to
methodically record and track problems and issues that arise during test
development, testing, and use. For example, the test review process may
generate a variety of issues (whether the test is necessary, appropriate, or
correct), while after publication users of the test suite may assert that a
particular test is incorrect. Such issues must be tracked, and their
resolution recorded.
]]

This is of course a quality issue but has nothing to do with test - suggest
move to the Operational Guidelines. Every WG should have a means of issue
tracking.

Comment 14 way too strong a must
==========
[[
Checkpoint 3.5 Automate the test materials management process [Priority 2]
Conformance requirements: The test materials management process must be
automated.

Rationale: Automation of the test materials management process, perhaps by
providing a web-based interface to a database backend, will simplify the
process of organizing, selecting, and filtering test materials.
]]
The rationale is true but does not justify a must; the QA group could
collect a set of tools that have been used to help automate test material
management, and help try and spread best practice but a *must* here is
ridiculous. This really should not be a checkpoint.

I note that the QAWG commits to AAA test conformance, please describe your
automatic system for test material management.
(Since the spec GL and the ops GL are in CR and not test GL, I would be
happy with an answer that restricted itself to those two documents).

Comment 15: not the WG responsibility
===========
[[
Checkpoint 4.2. Automate the test execution process [Priority 2]
Conformance requirements: Test execution should be automated in a
cross-platform manner. The automation system must support running a subset
of tests based on various selection criteria.

Rationale: If feasible, automating the test execution process is the best
way to ensure that it is repeatable and deterministic, as required by
Checkpoint 4.1. If the test execution process is automated, this should be
done in a cross-platform manner, so that all implementers may take advantage
of the automation.
]]
WebOnt made it clear to its implementors that we expected test results to
have been collected in an automated fashion, but it is not possible for a WG
to provide such an execution environment for every conceivable spec.

Once again, noting the QAWGs AAA commitments in its charter, I hope you will
demonstrate the sense of this checkpoint before any of your documents
proceed further along the recommendation track. I guess you need to solve
some natural language research problems first.

Comment 16:
===========
[[ TestGL
Checkpoint 5.1 Review the test materials [Priority 1]
Conformance requirements: The test materials must be reviewed to ensure that
they meet the submission requirements. The status of the review must be
recorded in the test materials management system, as discussed in Checkpoint
3.2 above.
]]
You cannot have a priority 1 depending on a priority 2, I think the
"management system" is the problem replace with "metadata".

In WebOnt we automated this part - every time the OWL Test Cases document is
produced all the test material is verified to conform with the "stylistic"
guidelines in OWL Test. Hence we meet the spirit of this without meeting the
letter. Once again, your desire to have strong wording is inappropriate.
Weaker wording that would be acceptable would be:
[[ TestGL
Checkpoint 5.1 Review the test materials [Priority 1]
Conformance requirements: The test materials should be reviewed to ensure
that they meet the submission requirements. The status of the review may be
recorded in the test materials metadata, as discussed in Checkpoint 3.2
above.
]]

I note that one test which I accepted

http://www.w3.org/TR/2003/PR-owl-test-20031215/byFunction#imports-014

had as its whole point that it did not conform to the stylistic preferences
(using a superfluous suffix on a URI) and that this presented problems which
were not exercised by the other tests.

So, it is important that there is adequate discretion in the process to
accept tests that do not meet the submission requirements.


Comment 17:
===========
Test 6.1
[[

Discussion: It is not necessary for tests to automatically report status for
this checkpoint to be met. It would be sufficient, in the case of manually
executed tests, for the test execution procedure to unambiguously define how
the person executing the tests should determine the test execution status.
]]

tell me again about the QA WG's tests (for opsGL and specsGL) that permit
unambiguous determination of the test execution status, I seemed to have
missed that part of your document set.

comment 18
===========
[[
Checkpoint 6.2 Tests should report diagnostic information [Priority 2]
Conformance requirements: When tests fail, they must provide diagnostic
information to assist the implementer in determining the source of the
problem.
]]

No!!

It is a huge amount of work for the WG to provide the implementors free of
charge with a test suite. No way are the implementors entitled to a test
suite with diagnostics. The cost is huge - developers get paid, they should
put some sweat in, too.

I look forward to seeing the QAWG's diagnostics in the test suite for opsGL
and specsGL.

This requirement is mad, and should go.


Jeremy
Received on Thursday, 18 December 2003 10:38:44 UTC