Jeremy Carroll's Review of the Quality Assurance Framework

This is a personal review, and does not reflect the views of Hewlett Packard. The survey of the QAF was initially done as part of the review by the Web Ontology Working Group, but the overall recommendations made here go beyond what was discussed as part of that review.

One part of this review is, I believe, member confidential, and is hence included by reference, rather than verbatim. I have no objection to that part being made visible to the public.

1. Documents Reviewed
2. Goals and Scope of this Review
3. Structure of this Document
4. QA Framework: Introduction

4.1 No ToC

4.2 Substantive - Goals

4.3 Lack of Requirements

4.4 Relationship with Process Document (substantive)

4.5 Add Three Words to Process Doc - make QAF Informative

4.6 RFC 2119

4.7 Consultancy

4.8 Reference to SPEC-EXTECH

4.9 Missing 1. 2. 3.

4.10 Two typos

4.11 References to QA WG pages

4.12 TAXONOMY reference

4.13 More Academic References Please

4.14 QAF is Unfinished: Why is it in CR?

5. QA Framework: Operational Guidelines

5.1 Checkpoint 3.1. Synchronize More

5.2 QA Moderator: Three Suggestions

5.3 Define QA commitment levels

5.4 Produce the QA Process Document

5.5 QA is not the same as test material development

5.6 Rename QAF

5.7 Is normative status needed?

5.8 Normative References listed as Informative

5.9 Should not have Normative Ref to pre-Last Call in CR

5.10 test validity appeals

5.11 Under construction

5.12 Checkpoint 5.5 the Working Group's procedure

5.13 Checkpoint 6.3 Discussion about TR Space

5.14 Two ToC errors

5.15 Relationship to other specifications

5.16 W3C Publishes Recommendations not Specifications

6. On "QA Framework: Operational Examples & Techniques"

6.1 WG Notes not in TR Space

6.2 Appendices not in Document Directory

6.3 Broken Links

6.4 Pubrules requries TRs to be in TR space

6.5 Suggest Notes should revert to Recs

6.6 Add Appendices in ToC

6.7 Five Comments on Templates

7. QA Framework: Specification Guidelines

7.1 Scope and Goals

7.2 Under construction, difficult to understand

7.3 Use dated links for intra-QAF stuff

7.4 Specification/standard/recommendation

7.5 WAI and WCAG links misleading

7.6 MUST too strong

7.7 Spec Guidelines Contravenes Checkpoint 1.1

7.8 Review Guidelines for MultiDocument Pubs

7.9 %s/MUST/may/g

7.10 Checkpoints 1.2 & 1.3 Illustrate

7.11 Checkpoint 2.1 Classes of Product

7.12 Checkpoint 2.3

7.13 Typo areno

7.14 Potentially inapplicable MUSTs should be SHOULD

7.15 Checkpoint 5 same as Checkpoint 3?

7.16 Checkpoint 5.5 OR is not an RFC 2119 keyword

7.17 Repetition?

7.18 Checkpoint 6.4 not enforceable

7.19 Checkpoint 6.5 vague

7.20 Checkpoint 7.1 RFC 2119

7.21 MUST is for Agents (1)

7.22 MUST is for Agents (2)

7.23 Checkpoint 9.3 Impose no restrictions on claims

7.24 Checkpoint 9.4 Implementation Conformance Statement

7.25 Guideline 10

7.26 Ops Guidelines and Spec Checkpoint 10.1

7.27 Charset, Editorial

7.28 ToC Editorial

8. On Last Call Issues Page

8.1 On Last Call Issue 13

8.2 On Formally Addressing Last Call Issues

8.3 On LC-111.2

8.4 On the status of the Last Call of the Spec Guidelines

8.5 On Connolly's Comment of April 2003

9. On the Test assertions for "QA Framework: Specification Guidelines"

9.1 Don't SHOUT

9.2 Operator precedence between AND and OR and OR

9.3 Checkpoint 1.1 broken (11 comments on 19 words!)

9.4 Rewrite or Drop Appendix

10. On the "QA Framework: Test Guidelines"

10.1 Candidate Rec?

10.2 On Guideline 1: Functional Analysis

10.3 Functional Analysis and Test Driven Development

10.4 Checkpoint 1.3 Analysis

10.5 Tests and Testable Assertions

10.6 Checkpoint 3.2: Test Metadata

10.7 Coverage

10.8 Issue Tracking

10.9 Way too strong a must

10.10 Test Execution is not the WG responsibility

10.11 Priority Level Problem

10.12 Letter of 5.1 is too strong - WebOnt fails unfairly

10.13 Submission Requirements Need Flexability

10.14 Checkpoint 6.1 Unambiguous test results

10.15 Diagnostics

11. On the "Implementation Plan & Report for the QA Specification Guidelines"

11.1 Assignments matrix link inappropriate

11.2 False XHTML claim

11.3 CSS problems

11.4 Copyright dates

12. On the QAWG QAPD (a dozen comments)
13. The QAWG Charter

13.1 AAA eh?

13.2 Second Last Call Needed

13.3 New PR entrance criteria

13.4 usable and useful or conformance

13.5 QA Report on QAWG Needed

14. Suggestions for Improvement

14.1 Consider Requirements

14.2 Cost Benefit Analysis

14.3 Analyse W3C Successes and Failures and Quality Costs

14.4 Appropriate linking to Tests

14.5 Conform with RFC 2119

14.6 Guideline Numbering

14.7 Perform adequate review before advancing documents

14.8 Match WG Ambition with WG Resource

14.9 Recruit WG participants with divergent views

14.10 Proactively Create Issues from Opposing Views

15. Administrivia, Formal Comments

1. Documents Reviewed

While this review is spattered with detail, this is not uniform. I have not read all parts of all these documents with the same attentiveness.

2. Goals and Scope of this Review

The main goal of this review is to add detail, personal comment, and suggestions for how to move forward, to the review by WebOnt that I participated in. The scope includes any comments that relate to whether the documents in Candidate Rec are ready or not to advance to Recommendation. This is made harder by the interdependencies between the CR documents and other documents with a lower official maturity status. It is also hard to understand what is the intended quality of the various Working Group Notes that form part of the framework.

3. Structure of this Document

I provide: detailed comment on the various documents; indicate some thoughts about what is missing; summary points; and suggestions as to how to move forward with the goals of the QAF. The final section deals with procedural administrivia.

4. QA Framework: Introduction

4.1 No ToC

4.2 Substantive - Goals

The goals chosen seem inappropriate, different goals would make for a different framework.

I believe test suites may contribute to this, but in terms of scoping the QA work, and in terms of setting the goals of the QA work this should be linked to the output of W3C which is a set of documents.

Thus test suites are only useful in as much as they help provide better quality recommendations.

This of course begs the question as to what are the quality metrics for a recommendation - suggestions:

The problem with setting conformance tests as the goal is that many WG members will not be committed to this goal

A problem with having conformance tests as a goal is that it is unrealistic to expect the whole of the WG to buy into it. Whereas (nearly) all (active) WG members will accept that the quality of the documents being produced is a reasonable goal for the WG. Quality is not the responsibility of a specialist subgroup but a shared responsibility of the whole WG - obviously different members of the WG will have different beliefs and opinions as to the value of testing, and will only really support test work once it has begun to show real benefit on more general measures.

4.3 Lack of Requirements

It is hard to appreciate what the QAF is trying to do, since you have omitted to publish a list of requirements. Hence, the reader is left to imagine what requirements you might have been trying to meet, which is less than satisfactory.

4.4 Relationship with Process Document (substantive)

In the issue list it is said that "General feeling at Brussels seemed to be that it was unnecessary. The QA Framework will require it" This seems to suggest that you perceive the QAF as an extenstion to the process document with force over other WGs. The language of the Ops Guidelines also suggests this. If the quoted paragraph is actually your intent, that the relationship is undefined then it is important that the QAF documents allow other WGs and other recommendations to not conform. This suggests a global substitute of "Working Groups" with "Conforming Working Groups" etc.

I think the decision to leave this as undefined as inappropriate. For me, as a non QA WG sympathizer, it is crucial. If you are really not trying to impose your view of quality on me, then I would not have to bother reviewing your documents. As is, I strongly object to the apparent constraints in your documents on my activities elsewhere in the W3C.

4.5 Add Three Words to Process Doc - make QAF Informative

A solution to the problem above is to identify what needs to change in the process document. I suggest that the QAWG should be seeking small changes to the process document to express the normative content of the QAF, and the QAF documents themselves should all be informative.

A strawman for the necessary change to the process document is as follows. Modify section 6.2.6 of the process document by changes the list of deliverables in the fourth bullet point from "(technical reports, reviews " to "(technical reports and test materials, reviews ".

A further possible change may be to add an additional bullet point after that fourth bullet reading "Working Groups whose deliverables include technical reports *should* also have supporting test materials amongst their deliverables."

4.6 RFC 2119

An editorial problem with the paragraph entitled terminology is that you do not use many of the keywords that you have imported. I think it would be clearer to reduce the list to those keywords that you actually use. (On the other hand, RFC 2119 suggests text that is closer to yours, so a more pedantic comment would be that you should use the exact phrasing given in RFC 2119 - maybe the two comments balance out). I suggest changing "will be used" to "are used" or "are to be interpreted". Also note that the paragraph importing RFC 2119 (as a normative reference) is directly contradicted by the following paragraph (which denies the existence of any normative requirements in the introduction). I suggest removing the import of RFC 2119 from the Intro. Where not already included, such imports can be added into the normative documents; better still is not to have normative documents.

4.7 Consultancy

The section 3.1 in the intro points to the operations guidelines, presumably section 3. This does not add much to the Intro except the reference [QAWGPD] which is "under construction".

I find this subtracts from the documents, and suggest section 3.1 of the intro and section 3 of the ops guidelines should be deleted.

Minimally, one or other of the sections should go, since they are fairly similar and could easily be merged.

4.8 Reference to SPEC-EXTECH

The dependency between the documents in CR and an editors draft for which there is no official publication is not appropriate.

4.9 Missing 1. 2. 3.

4.10 Two typos

4.11 References to QA WG pages

In many ways the references are disappointing, and look like an attempt to avoid pubrules and W3C process rather than make legitimate references.

My understanding is that Candidate Recs are completed documents being offered for implementation and review not work that is known to be incomplete.

Many of these references, or the use put to these references (e.g. [QAWG] and [QAIG]), either need to be dropped or included as appendices. This means that they should be finished before the QAF is brought back for a second last call.

4.12 TAXONOMY reference

4.13 More Academic References Please

A further disappointment in these references, particularly the QA Library, is the lack of non-W3C perspective. You should make it clear that you have considered input of the highest academic calibre from peer-reviewed sources on how to achieve quality in technical documents. Ideally you should have invited experts who wrote such papers. Maybe you do, in which case please make it clearer: one or two self references from the academic literature would not go astray.

4.14 QAF is Unfinished: Why is it in CR?

The references to [QAF-TEST] , [TEST-EXTECH] , are also problematic, given their unfinished nature. I cannot understand how you can bring any part of the QAF to Last Call or Candidate Rec without having completed it all.

5. QA Framework: Operational Guidelines

5.1 Checkpoint 3.1. Synchronize More

While I support the WebOnt WG's more general comment about your checkpoints being too strong, I also suggest that this one is too weak. It is hard to review specifications that are released in pieces with references from one part at one level of completeness to another at another level.

I find your documents a very good example of how not to release LC and CRs. It is too difficult for the reader to make any sort of consistent cut through what is in reality a single logical publication, but which has pieces at very different levels. I also find that WG notes for the informative sections work less well than using recommendations for the informative sections. Partly this is because you seem to have allowed yourself lower editorial standards in the notes (e.g. the missing ToCs), partly that a WG note is not necessarily a consensus document, partly because the lack of LC, CR, PR checkpoints in a note make it difficult for the reader to understand what sort of review is appropriate.

I find no evidence that you fulfilled this checkpoint in your own publications. In particular I have not found test material for QAF-OPS, and the test material I have found for QAF-SPEC is too incomplete for CR, (the main content seems to be http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/qaframe-spec-ta which is test assertions rather than tests)

I note that the QA WG has not followed this practice, which makes reviewing your work significantly harder.

5.2 QA Moderator: Three Suggestions

Working groups are already responsible for the quality of their work. It is unclear what power the QA moderator might have if there is a quality problem. WG Chairs who find the quanitity of chair's responsibility too large can already delegate specific responsibilities to other WG members.

It is unclear what rewards and motivations are offered to the QA moderator. It is unrealistic to expect the QA moderator to do the job well without appropriate recognition.

If the QA WG has appointed a QA moderator it has done so with the utmost secrecy. The checkpoint is flawed in not requiring the name and e-mail address of the QA moderator to be available on the WG homepage. If the QA WG does not have such a moderator I suggest that you have already decided that this requirement is unnecessary; your documents should be brought into line with that decision.

If you must suggest this, then at least make it clear that having co-moderators is acceptable, (cf Process doc: "Each group MUST have a Chair (or co-Chairs)"). Effectively, I shared the QA Moderator role for WebOnt

Summary: Delete, if not then: add requirement for e-mail addr, permit co-moderators.

5.3 Define QA commitment levels

For example, I know of one WG that has committed to AAA for operations, specifications, and test materials: the QA WG. Unfortunately these commitments have not been followed through in the operations, the specifications or the test materials of that WG.

I wonder if all that is really required is a change in the W3C culture that charters mention the expected level of test development.

5.4 Produce the QA Process Document

Checkpoint 4.3 requires the public available of the process document. I suspect this is inadequate, and the requirement should read something like: "the Working Group's QA Process Document should be linked from the WG home page (with the same access controls)".

FYI, I found it very hard to find the QAWG QAPD, despite you having a charter commitment to its development, and having an under construction link from the QA Ops Guidelines.

I am also unconvinced of requiring WG's to have a process document, even with a "should" instead of a "MUST", I suggest a "may" is more helpful. Producing a process doc is a non-trivial cost. Process docs such as that of the QAWG that are barely followed and/or cover too many cases that do not actually occur, bring the process into disrepute. Only working groups that feel a need for a process doc should have one, and then the process doc need only cover those parts of the process which the WG believes needs clarifying.

The title of the Quality Assurance Process Doc is inappropriately broad. Judging by that of the QAWG, the focus of the document is on test materials. There are many other aspects to quality (e.g. readability and timeliness). I suggest the title of these documents should be something like "Test Development".

Since the QA WG has arrived at Candidate Rec without following the process in its process doc such documents are clearly superfluous in that group's opinion. They should not be imposed on other groups.

5.5 QA is not the same as test material development

An example pointed out by WebOnt is the title "QA Moderator" whose responsibilities are only to do with tests.

A further example is Checkpoint 1.4 "Checkpoint 1.4. Enumerate QA deliverables and expected milestones. [Priority 1]", looking at the draft charter, you seem to only have enumerated test deliverables, and not other QA deliverables. Weaking this checkpoint so that you actually conform with it is suggested. This would change the normative force of the checkpoint.

5.6 Rename QAF

A further place where you confuse QA and Test is in the name "QA Framework" - the framework is not a general quality framework, but only a test development framework. I suggest you rename it.

Further a global search of all your documents making a systematic check for such misuse of "QA" is required.

5.7 Is normative status needed?

Guidelines 5 to 8 in particular look simply like advice as to how to realise the commitment to producing test materials.

This advice generally looks OK, but some of it is presumably over-the-top or inappropriate for some working groups.

Might it not be more effective and less pushy to seek normative status only for the charter statement concerning the production of test materials, and to leave these parts (guidelines 5 thru 8) of the framework as a resource that Working Groups may draw on, or may not. This allows the people closest to the specific problems of testing the FooBar Recommendation with maximum ability to solve their problems in the most appropriate way.

5.8 Normative References listed as Informative

The references to the QA Test Guidelines and QA Spec Guidelines are included in section 6.2 "informative" references, but they are crucial to the understanding of the normative content of Checkpoint 1.1 "for any Test Materials that it plans to produce or adopt, the WG MUST define its commitment level to QA Framework: Test Guidelines -- A, AA, or AAA."

5.9 Should not have Normative Ref to pre-Last Call in CR

Since the test guidelines are unfinished it was inappropriate to advance the Ops Guidelines to last call, let alone CR.

5.10 test validity appeals

You say: "Such a procedure is applicable both to external challenges once test materials have been published, as well as to internal challenges during the building of test materials."

It is not clear that a WG can/should constrain what the W3C does with any W3C test suites after Rec. See a comment from Dan Connolly on an early version of OWL Test Cases.

5.11 Under construction

A fair amount of the informative content related to the Ops Guidelines is "under construction" - e.g. the QAWG QAPD, some of the examples and techniques document. In my judgement, the move to Last Call and Candidate Rec was premature given the quantity and importance of the unfinished work.

Aside: Contrast with WebOnt - when we went to Candidate Rec there were still some tests being added, and implementors were explicitly asked to keep up with the editors draft, these test moreover are now normative (but subsidiary to OWL Semantics); there was a significant informative document that will be published as a WG Note in January. WebOnt had a formal objection to the lack of the additional informative content, which we opposed on the grounds that four implementors had managed to implement from the terse normative document (not on the grounds that we were working on it). WebOnt also explicitly listed as "at risk", part of the design for which we still had work outstanding, as is, the implementation feedback caused us to abandon the extra work.

The point of the aside is that I don't think it is particularly clear when one crosses the boundary between being finished enough for CR and still having work to do. However, I think you are the wrong side of the grey line.

5.12 Checkpoint 5.5 the Working Group's procedure

WebOnt breached this condition, tests were APPROVED or OBSOLETED or REJECTED on WG decision (That was the defined procedure). No rationale. No garantee of review of accuracy, scope, and clarity. That worked fine, there is no need to assume that WGs are stupid and will approve incorrect tests. With three tests we took an explicit risk and approved them despite not really having evidence that were correct. This was a trade off between various different quality concerns, including timeliness, and the importance of the tests. We modified one test because our machines told us to. Noone understood it as far as I could gather. There is nothing wrong with that, as long as we had adequate evidence that modifying the test was in line with what the other documents said, and hence would enhance interoperability.

I think the rationale is flawed in wanting "quick" decisions. Many OWL tests took a year to approve, it was not a problem. If we had not voted to move to PR some of those tests would still be awaiting a decision.

The WebOnt experience suggests that, the last sentence of the discussion should be weakened to "They should at least include a mechanism for acceptance or rejection of tests."

The rationale should be something like "It is helpful to distinguish those tests that have been accepted by the Working Group"

The Checkpoint could read, "Define a procedure for accepting or rejecting tests."

5.13 Checkpoint 6.3 Discussion about TR Space

I request that the second sentence of the discussion and the two bullet points be deleted.

5.14 Two ToC errors

The appendices should be included in the ToC in a more conventional fashion, e.g. as a continuation of the <ul> list. I missed their existance on first reading.

5.15 Relationship to other specifications

Section 1.5. This part of the QAF should be boiler plate that is nearly identical in all your documents. It should probably come sooner. Without it, the reader is lost.

5.16 W3C Publishes Recommendations not Specifications

I wonder whether there should be a global substitution throughout the QAF replacing "specification" by "recommendation".

6. On "QA Framework: Operational Examples & Techniques"

6.1 WG Notes not in TR Space

6.2 Appendices not in Document Directory

This particular note violates pubrules section 1.5 point 1, in a way that also caused me significant difficulty when reviewing. (The appendices are not in the document directory, in fact, the document does not have a directory, making it hard to know where the documents end and the general QA WG scratch space begins).

6.3 Broken Links

6.4 Pubrules requries TRs to be in TR space

My reading of that same point of pubrules is that the decision reported in the December 2002 publication to move the document into WG space was a violation.

I believe that the same affect, of having a WG copy that changes fairly frequently during development, could have been better achieved by keeping the editors' draft in WG space.

6.5 Suggest Notes should revert to Recs

Suggest move all QA WG notes into TR space, and ensure they conform with pubrules, in fact, I think it would have been clearer to have had these as informative Rec track documents.

6.6 Add Appendices in ToC

A further problem is that the appendices are not listed as part of the table of contents in a conventional style, again this makes it hard to know where the boundaries of the document are.

6.7 Five Comments on Templates

I am not happy with the normative content of either the process document template or the charter template, but that follows points I have already expressed. In addition:

7. QA Framework: Specification Guidelines

7.1 Scope and Goals

7.2 Under construction, difficult to understand

I have found the language used quite difficult to understand as a non-expert. The fact that too much of the companion example and techniques document is still being written has made it impossible for me to adequately review some parts of this document.

An example problem is Guideline 3, the terms "modules" "profiles" etc all seem to mean something specific, I probably have not read section 2 adequately. Section 2.3 seems clear enough, but does not help me understand Guideline 3. The "under construction" banner has disconcerted me.

This of course is linked to the problem of having a CR document that depends on an editors draft for its informative content.

My understanding is further hampered by the Examples and Techniques doc not being in TR space, and the fragIDs not working in my (relatively broken) browser after the redirect.

7.3 Use dated links for intra-QAF stuff

The use of the undated link is (weakly) in contravention of your own operational guidelines (3.1) on synchronization.

7.4 Specification/standard/recommendation

While my global point that W3C publishes recs not specs stands, a specific instance is section 1.7 para "priority 2" that talks about 'standards', this is an editorial error and should change to be the same as priority 1 and 3.

7.5 WAI and WCAG links misleading

I wasted some time by following this links, suggest reordering section 1.5 by starting with "Each guideline includes: .." and moving the deleted text concerning WAI and WCAG to the acknowledgement section.

7.6 MUST too strong

As an example of a MUST that is too strong consider the second MUST in checkpoint 1.1. The "This information" referes to the most recent information in the test, i.e. "a statement of objectives and/or goals" and since we now have a "MUST appear" the previous SHOULD can never be ignored.

7.7 Spec Guidelines Contravenes Checkpoint 1.1

Moreover, since the Spec Guideline does not include any goals (the goals are in the QAF Intro, and not in the section 1.1 Scope and Goals), it is in contravention of this checkpoint. The checkpoint is flawed and should be changed.

7.8 Review Guidelines for MultiDocument Pubs

Every Guideline in this document should be carefully reviewed to see if it respects the needs of multidocument publications such as the QAF, or RDF, or OWL.

7.9 %s/MUST/may/g

Suggest global weakenings of all MUSTs to SHOULDs or better MAYs or better still "may"

7.10 Checkpoints 1.2 & 1.3 Illustrate

Editors have discretion and freedom to construct the document best suited to their audience. The inclusion of examples is a rule of thumb not something to be set in stone.

As an example the OWL Test Cases Proposed Recommendation does not meet this checkpoint, nor does it suffer from not doing so.

7.11 Checkpoint 2.1 Classes of Product

Such recommendations may be complete, intelligible and self-contained without any definitions of products that do things with these documents.

For example, RDF defines a knowledge representation language. The range of products that could do something with such documents is large but what matters is the meaning of the document.

Often declarative definitive statements are clearer than procedural ones, whereas this checkpoint is likely to commit Working Groups to building procedurally oriented recommendations.

7.12 Checkpoint 2.3

The enumerated list seems an arbitrary ad hoc list. The phrase "if no category matches, define a category." makes the preceding MUST toothless, and hence superfluous.

The list is reminiscint of the 'certain Chinese encyclopedia' which is quoted in a passage from Borges quoted in the preface to the Order of Things by Foucault: "animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in this present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies". (It's not really that bad, but this passage is quite a laugh).

7.13 Typo areno

7.14 Potentially inapplicable MUSTs should be SHOULD

The MUST is too strong: any MUST which can be inapplicable SHOULD be a SHOULD. (Need global review)

7.15 Checkpoint 5 same as Checkpoint 3?

Having failed to understand checkpoint 3, I get a sense of deja vu when I get to checkpoint 5.

7.16 Checkpoint 5.5 OR is not an RFC 2119 keyword

7.17 Repetition?

7.18 Checkpoint 6.4 not enforceable

7.19 Checkpoint 6.5 vague

While this seems generally a good idea, it seems too vague for an RFC 2119 keyword.

7.20 Checkpoint 7.1 RFC 2119

The third requirement "MUST state why RFC2119 keywords were not used." is inappropriate. The previous requirement (the SHOULD) required that there was a rationale. It is a purely editorial matter whether that rationale is helpful or not to the intended reader of the recommendation.

7.21 MUST is for Agents (1)

I find Connolly's MUST is for agents compelling. I like the structure of the OWL Proposed Rec in which the bulk of the conformance requirements is specified mathematically, using squiggles (and not RFC 2119). We do tie this in with RFC 2119 keywords only when we talk about a fairly mythical agent an "OWL Consistency Checker", which is quite deliberately an idealised reasoner, and not something that we actually expect to be widely used.

In as much as Checkpoint 7.1 requires W3C recommendations to behave other than as suggested in Connolly's MUST is for agents, I oppose this checkpoint.

7.22 MUST is for Agents (2)

I suspect that whole pattern could be applied successfully to the Spec Guidelines. e.g. define a "A" conforming Spec, an "AA" conforming Spec, and an "AAA" conforming Spec. The checkpoint 1.1 becomes "Near its beginning, an A conforming Spec defines the subject matter of the specification, and normally includes a statement of its objectives and/or goals." Yes, that is significantly better.

I suggest all the guidelines be written in this declarative style; probably throughout QAF

7.23 Checkpoint 9.3 Impose no restrictions on claims

7.24 Checkpoint 9.4 Implementation Conformance Statement

A winning streak - well at least the first half. It would have been good to have an ICS for OWL reasoners.

Again I grumble about MUST, and the intrusion into editorial discretion by requiring the editor to include a rationale (rather than merely to have one).

7.25 Guideline 10

7.26 Ops Guidelines and Spec Checkpoint 10.1

Note the Ops guidelines does not meet this checkpoint. Have you reviewed your documents against the Spec Guidelines?

7.27 Charset, Editorial

I think it would be slightly better if you used the same charset for all pieces of this document.

7.28 ToC Editorial

8. On Last Call Issues Page

8.1 On Last Call Issue 13

LC-13 includes the comment "Would be better if the document were AAA-conforming to itself IMO.", and the WG agrees with "QAWG agrees that SpecGL should be AAA-conformant to SpecGL, and commits to AAA conformance by Candidate Recommendation". Within the actual CR you seem to only claim (incorrectly) AA conformance.

8.2 On Formally Addressing Last Call Issues

The process document requires that the WG has formally addressed last call issues before advancing to CR.

I found this very confusing, not least, because this structure is not made clear anywhere.

A specific problem was that your issue list does not permit me to see the response sent to Marc Hadley concerning issue 13.

Hence I am unable to ascertain whether to not you told hime one thing and did another.

The DoC page only gives the summary resolution, but does link to the Last Call Issue page on which you appear to have agreed that Spec Guidelines will be AAA before CR, (which it was not, even on your own overly optimistic assessment).

Thus I cannot tell whether Marc's silence interpreted as assent was at least formally informed by that part of your decision.

In fact, the issue list does not let me verify that you formally addressed any issues at all.

The division of DoCs by document is a procedural slight of hand. Many of the most important comments are in fact comments about the whole QAF. The failure to formally address them, invalidates the advance to CR of Op GL and Spec GL. (The next comment is an example)

I like that you choose to publish a consolidated version of the Spec Guidelines with the changes from the last call comments. It may have been better to have included a link to that version in your messages formally responding to the reviewers.

8.3 On LC-111.2

While this issue was Accepted with Modification, it is not clear that you ever informed them of this, or informed them of the fact that the modification is that you decided to move to CR without having done the cost-benefit analysis.

This is not merely editorial, but is a precursor to a substantive comment: if these are the benefits and these are the costs then this is not worth it. This issue has been raised earlier by Jonathan Robie on the IG list. It is vital that the QAWG addresses this (i.e. produce a CBA, not plan to produce a CBA) before asking for consensus around its work.

I note that you did not formally address this comment. You had decided to inappropriately take it as a comment on the Introduction, rather than on all the documents. Specifically your advice to Roger Your reply to this DoC document is not required was flawed. On the basis of that advice he appears not to have forwarded the message on to the DIWG. Thus, the DIWG did not have a chance to consider whether they wished to formally object or not to your not producing a CBA before moving to CR.

8.4 On the status of the Last Call of the Spec Guidelines

The XML Protocoal WG commented that the status of the document of section did not state that it was a last call working draft.

Your last call issues list failed to list this comment, and this comment has hence not been formally addressed.

(The comment is incorrect, the reviewer seems to have been looking at an editors draft. This may be indicative of failing to make adequate distinction between editors drafts and published drafts)

8.5 On Connolly's Comment of April 2003

In April 2003, Dan Connolly's commented on the spec GL. One of the editors responded, and Dan said, "I'm not satisfied by this response to my last call comment". This formal objection is not recorded in your last call issues page. It was not recorded in your call for implementations.

The QA WG should insure that a message goes to the AC reps informing them of this omission in the previous call for implementations. The director should be given an opportunity to consider Connolly's objection and the QAWG's inadequate response to it.

9. On the Test assertions for "QA Framework: Specification Guidelines"

9.1 Don't SHOUT

9.2 Operator precedence between AND and OR and OR

9.3 Checkpoint 1.1 broken (11 comments on 19 words!)

None of the QAF documents conform with this checkpoint, since it is way too narrow. In detail:

9.4 Rewrite or Drop Appendix

It is easy to write this appendix well, the editor just needs to pick out some of the words of the main document. By trying to use different words the editor fails.

I suggest a complete rewrite of this appendix, merely quoting the relevant parts of the spec.

That seems like make work, why not drop the appendix, and drop the part of the spec that says you MUST have it, (checkpoint 10.1 I think).

10. On the "QA Framework: Test Guidelines"

10.1 Candidate Rec?

The QAF is about testing, hence the Test Guidelines is the central document to the content of the framework. It is an abuse of process to have advanced the other documents to last call before this document was ready.

It is not possible to perform an adequate CR review of the OPs Guidelines or the Spec Guidelines without also performing a review of the Test Guidelines.

10.2 On Guideline 1: Functional Analysis

It is hard since the main purpose of the tests for these WG were to help in the development of a quality recommendation, and one cannot do a final functional analysis of the rec until its basically finished, which would have overly committed us to a waterfall model of development. In fact, that motivation indicates that the second sentence quoted is too strong there is no "must be performed" here, suggest "may be helpful".

Having said that, it is clear that the coverage of the tests in both the Semantic Web WGs is weaker than it would have been if we had followed this guideline at some point, this then comes back to issues to do with synchronization and timelines etc. In WebOnt I am reasonable sure that most of the untested bits are from that part of the rec that is fairly easy to implement. Thus, since we do not have a conformance test suite, the many OWL implementations that pass all the tests may nevertheless have a variety of trivial errors that prevent interoperability. I don't see that as the responsibility of the WG - conformance tests come later, and at that point (or in bug reports to software developers) it will become clear what trivial errors in software need fixing. Of course, in a very few cases these trivial errors may point to minor errors in the spec where there is insufficient clarity - but I believe that issue driven test development has covered almost all of these areas adequately.

10.3 Functional Analysis and Test Driven Development

The editors then had complete freedom to write text which conformed with the test cases. (The text was later reviewed, so the freedom was not as excessive as it seems).

The test cases can be found in this directory. As far as I can tell, this predates the first editor's draft of the revised grammar, and modified the old grammar. i.e. that this decision, does not follow your methodology at all, is a test-focussed decision, and was good.

Just perfect! (The syntax used in the test case is a muddle, but all the WG members could understand it. The test case will have been sanitised into the RDF Test Cases somewhere. The RDF Semantics documents reflect this. Using test cases as part, sometimes the only part of, the issue resolution process brings clarity and openness. It leaves the editors with large discretion that permits the documents to be of higher quality.

In as much as the Test Guidelines and the QAF prohibit and/or obstruct this behaviour I suggest that the QAWG has got it wrong, and needs to start over.

10.4 Checkpoint 1.3 Analysis

10.5 Tests and Testable Assertions

Even the simplest OWL test relies on many parts of the recommendation. The idea that it is possible to tie a test to one or two parts of the recommendation is philosophically flawed (similar to the concept of causation, cf a huge body of literature). I do not believe this is uniquely a property of OWL.

Obviously one tries to structure the tests in such a way that assuming a system passes some set of easier tests, then this new test presents an interesting challenge, but ... Of course this also amounts to the issue that you lot seem to believe that it is possible to test for conformance whereas that is trivially incorrect. (Given any set of conformance tests for any system where each test is characterised as one or more inputs resulting in one or more outputs, the piece of software that is defined to precisely pass the test suite, by giving the determined output for the determined input, and otherwise to fail horribly, is a non-conformant piece of software that passes the conformance tests).

Suggest drop these requirements, and the related ones in Guideline 10 of SpecGL.

Possibly weaken to a "It may be helpful to list the test assertions found within or derived from a recommendation"

10.6 Checkpoint 3.2: Test Metadata

I found it easier to completely own the test metadata in webont (well me and Jos the co-editor). Unfortunately the metadata quality is crucial and is best ensured by having a fairly small number of people responsible - sure it's a lot of work.

The list of test metadata omits "the type of the test" and "the files associated with the test"

10.7 Coverage

Makework - this statistic is useless. Please do not waste other people's time in calculating it. Any test suite tests 0% of any plausible language worth specifying because the language is infinite and the test suite is finite. Any other number is simply a fib.

10.8 Issue Tracking

This is of course a quality issue but has nothing to do with test - suggest move to the Operational Guidelines. Every WG should have a means of issue tracking. Also this should not be normative, merely a QA resource that WGs may draw from.

10.9 Way too strong a must

The rationale is true but does not justify a must; the QA group could collect a set of tools that have been used to help automate test material management, and help try and spread best practice but a *must* here is ridiculous. This really should not be a checkpoint.

I note that the QAWG commits to AAA test conformance, please describe your automatic system for test material management. (Since the spec GL and the ops GL are in CR and not test GL, I would be happy with an answer that restricted itself to those two documents).

10.10 Test Execution is not the WG responsibility

WebOnt made it clear to its implementors that we expected test results to have been collected in an automated fashion, but it is not possible for a WG to provide such an execution environment for every conceivable setup.

Once again, noting the QAWGs AAA commitments in its charter, I hope you will demonstrate the sense of this checkpoint before any of your documents proceed further along the recommendation track. I guess you need to solve some natural language research problems first.

10.11 Priority Level Problem

You cannot have a priority 1 depending on a priority 2, I think the "management system" is the problem replace with "metadata".

10.12 Letter of 5.1 is too strong - WebOnt fails unfairly

In WebOnt we automated this part - every time the OWL Test Cases document is produced all the test material is verified to conform with the "stylistic" guidelines in OWL Test. Hence we meet the spirit of this without meeting the letter. Once again, your desire to have strong wording is inappropriate. Weaker wording that would be acceptable would be:

10.13 Submission Requirements Need Flexability

On the same checkpoint, I note that one test which I accepted, imports-014, had as its whole point that it did not conform to the stylistic preferences (using a superfluous suffix on a URI) and that this presented problems which were not exercised by the other tests.

So, it is important that there is adequate discretion in the process to accept tests that do not meet the submission requirements.

10.14 Checkpoint 6.1 Unambiguous test results

Tell me again about the QA WG's tests (for opsGL and specsGL) that permit unambiguous determination of the test execution status, I seemed to have missed that part of your document set.

10.15 Diagnostics

It is a huge amount of work for the WG to provide the implementors free of charge with a test suite. No way are the implementors entitled to a test suite with diagnostics. The cost is huge - developers get paid, they should put some sweat in, too.

I look forward to seeing the QAWG's diagnostics in the test suite for opsGL and specsGL.

11. On the "Implementation Plan & Report for the QA Specification Guidelines"

I made the mistake of following this link from the SOTD of the QAF Spec Guidelines.

11.1 Assignments matrix link inappropriate

I suggest that the sentence "In addition, the QA WG has had a pro-active role by making and contributing ..." be deleted, because of the inappropriate nature of the material at the other end of the link. http://www.w3.org/QA/Group/2002/06/reviews. That page is a matrix of assignment by QA WG member and is not appropriate for review - which should be organized in some other way, e.g. by recommendation or status. I found further problems when looking at the RDF reviews. These were: firstly that they were news to me, as an active member of the RDF Core WG I would expect a review to have been circulated. Second the "done instead .. QT" makes no sense to me. These problems can be solved by simply deleting the link.

11.2 False XHTML claim

A more serious problem is that the CSS3-UI review has a "XHTML 1.0" tick box, when it isn't (it has a charset problem). I hope the embarassment of this suggests doing a systematic check of the QA web space for similar problems.

11.3 CSS problems

The CSS validator reports problems with the stylesheet used by the QA WG (for instance for the CSS3-UI review). This is less serious since you don't claim CSS conformance.

11.4 Copyright dates

A further issue I had with the QA pages is that the template seems to fix the copyright dates inappropriately. In particular, few, if any, of your pages date back to 2000. Again, some effort needs to be expended on globally fixing this.

12. On the QAWG QAPD (a dozen comments)

This is under construction, but is crucial as to whether you conform with your charter commitment to quality.

I think it would be valuable to keep cycling the QA review of QA until the QA goal posts are ones you have met - partly this is about upping the quality of your work, partly this is about making the statement of ambition of the QA goalposts more reasonable, and partly it is about dropping irrelevant goals from the QA framework.

13. The QAWG Charter

13.1 AAA eh?

The WG has failed to meet the quality commitments in its charter. Specifically the advance to CR should have been linked with a clear statement of what had been achieved towards the AAA goals and the plan for achieving the other goals by PR.

13.2 Second Last Call Needed

It is not possible to recover the failure to meet AAA operational goals without reverting to last call or before.

13.3 New PR entrance criteria

In particular I suggest that the WG should not request Proposed Recommendation before they have achieved and documented AAA conformance in all respects. I note that this adds to the PR entrance criteria implicit in the CR requests.

13.4 usable and useful or conformance

The first paragraph talks about of section 2 (Scope and Deliverables) "usable and useful" test suites, the second about "conformance testing". A fundamental problem with the QAWG is the conflation of these two different concepts.

13.5 QA Report on QAWG Needed

In order to have any credibility the QAWG needs to demonstrate that it meets its own standards. The resolution of LC-13 reads "QAWG agrees that SpecGL should be AAA-conformant to SpecGL, and commits to AAA conformance by Candidate Recommendation. " a further valueless commitment by the QAWG. It is imperative that the QAWG demonstrates that it is committed to the quality of its own work. This is not a request for a further empty commitment.

As an example, the WG does not appear to have addressed the point found in your own review that the doc should state explicitely that the SHOULD and MAY can be implemented separately.

14. Suggestions for Improvement

14.1 Consider Requirements

I suggest that an appropriate goal is something like higher quality TR publications and then working out what needs to be done to achieve this.

As a specific example, I suspect that suggesting that including a requirements document amongst the WG deliverables is a good idea.

Having done a requirements gathering phase it is useful to try and link the recommendations, particularly the MUSTs and SHOULDs back to those requirements. Any recommendations that do not link to the requirements should be considered for deletion.

14.2 Cost Benefit Analysis

Conforming with your suggested requirements is a nontrivial exercise that will impose significant cost on W3C WGs, Editors, and Members.

I suggest that minor changes to the process document will achieve almost all the benefits at a small fraction of the cost.

14.3 Analyse W3C Successes and Failures and Quality Costs

RDF Model and Syntax was a muddle, produced fairly rapidly. The wording was so unclear that a new WG had to be chartered to clear it up. That is a significant cost.

While RDF M&S had quality problems, the achievement of timeliness may well have been a critically correct decision.

XLink may be clear, but for one reason or another the recommendation is not useful.

Further thoughts are that the delays imposed by the constraints of the QA Framework may have other costs such as the establishment of de facto standards that are worse than what the WG would have produced without the QAF constraints.

Good advice about how to decide where to spend the available effort to improve the quality of recommendations needs to be informed. The current QAF documents appear to be simply dogmatic about the importance of testing, and blinkered as to the value of other quality parameters.

14.4 Appropriate linking to Tests

I have no idea how to tell whether or not there are tests for your CR documents. I have to resort to google.

In any case the QA documents should suggest that rec track documents have clear and straightforward links to the relevant test suites.

14.5 Conform with RFC 2119

You do not follow RFC 2119 section 6. This is a serious failing given the wide use you make of that RFC. To refresh your memories:

The MUST is the only use of an RFC 2119 key word as a key word within RFC 2119. I believe that every usage of RFC key words in the QAF is in breach of this constraint. I have already drawn to your attention many of the most suspect examples, but I have not found one that is clearly within this constraint (I think that a case could be made for about one-fifth of the constraints, e.g. requiring that the QAPD,if any, be public; requiring a public archived mailing list, but these are on the fringe of the subject matter of QAF). I suggest that the subject matter of QAF is more appropriately informative rather than normative, since it concerns "a particular method on implementors" (i.e. Working Groups and Recommendation editors) for achieving clarity and quality from which interoperability arises. Other methods are as good, and the QAWG must not use RFC 2119 key words to impose.

Note that elsewhere I have suggested replacing MUSTs with SHOULDs or MAYs. These too would be in breach, and really dropping normativity is the only sensible route that is in keeping with the judgement of the RFC 2119 authors of the importance of humility.

I am deeply concerned by your decision not to address these problems when you became aware of them (see Susan Lesch's comment of 14th March 2003, which refers to earlier dicsussion), and your minutes of April 21 "questioning our use of these terms". See also Dan Connolly's unaddressed comment. If you feel you can ignore 100% of the MUSTs in RFC 2119, why might you expect anyone to respect any of the MUSTs that you state?

14.6 Guideline Numbering

Perhaps your guidelines and checkpoints should have a letter before so it is clear which document they come from (e.g. Guideline T.1)

14.7 Perform adequate review before advancing documents

I am not a member of the QA Working Group, since I have no desire to work on Quality Assurance. The problems found in this review are basic. They should have been found by:

Presenting grossly unfinished and inadequate work for CR review forces W3C members and participants who do not wish to contribute to the tasks of the QA WG to do your work for you. Given the menaces implicit in the MUSTs you wish to impose on other WGs this amounts to an act of extortion.

14.8 Match WG Ambition with WG Resource

Your WG knows that it is trying to do too much with too few people who have too many other commitments.

I suggest that as well as trying to recruit new members that you should also reduce the ambition of your WG.

I believe that producing a few informative resources rather than wide ranging normative recommendations will be a better match for the actual effort the WG participants appear to have for this work.

14.9 Recruit WG participants with divergent views

Occasional interchange on the QAIG list suggests that the stance taken by the QAWG is not a W3C consenus and there is support within the W3C for the attitude taken in this critique.

The QAWG either has to try a whole lot harder to build a true W3C consensus or give up. Given that many of those who disagree with you basically take a "If it ain't broke, don't fix it" attitude, you have a lot to do to engage sufficient positive energy from them. I, for one, do not see it as worth my time to work on QA within W3C.

14.10 Proactively Create Issues from Opposing Views

You should take the proactive step of converting conversations of disquiet on the QAIG list into issues on your issue list, and make sure that you formally address them to the satisfaction of naysayers.

As an example, Jonathan Robie says I am concerned about anything that would increase the work load or the constraints on editors. This concern is not reflected in your issues lists.

I am far from convinced that the WG sees its task as seeking a W3C consensus, rather than imposing your views on others.

15. Administrivia, Formal Comments

I hope that my review has already persuaded you that you need to return to last call (or earlier).

If not, I suspect it will be helpful to you to know which I would intend to pursue through to formal objection.

All these comments are formal comments against all your documents. i.e. to advance any of your documents I wish you to formally address the comments below which can be applied to all of your documents.

The first is that it is inappropriate to advance a specific document in the QAF without advancing them all. Thus if any document is not ready, including appendices to informative annexes, I object to the advance of any other document. A non-exhaustive list of earlier points illustrating this comment is as follows:

The second is that the amount of change since the first last call is sufficient to necessitate a second last call (this is here partly in case you accept all my other comments).

The last is a combination of all the other points to do with your breaches of the process document, pubrules etc. I believe I have documented enough examples above. I consider comment 8.5 alone to be a fatal error of this sort. In addition the quantity and importance of more minor quality errors is unacceptable. A non-exhaustive list of earlier points illustrating this comment is as follows: