This is a personal review, and does not reflect the views of Hewlett Packard. The survey of the QAF was initially done as part of the review by the Web Ontology Working Group, but the overall recommendations made here go beyond what was discussed as part of that review.
One part of this review is, I believe, member confidential, and is hence included by reference, rather than verbatim. I have no objection to that part being made visible to the public.
While this review is spattered with detail, this is not uniform. I have not read all parts of all these documents with the same attentiveness.
This review also touches upon other documents such as:
The main goal of this review is to add detail, personal comment, and suggestions for how to move forward, to the review by WebOnt that I participated in. The scope includes any comments that relate to whether the documents in Candidate Rec are ready or not to advance to Recommendation. This is made harder by the interdependencies between the CR documents and other documents with a lower official maturity status. It is also hard to understand what is the intended quality of the various Working Group Notes that form part of the framework.
I provide: detailed comment on the various documents; indicate some thoughts about what is missing; summary points; and suggestions as to how to move forward with the goals of the QAF. The final section deals with procedural administrivia.
Please add a Table of Contents
The goals chosen seem inappropriate, different goals would make for a different framework.
I believe the goal should be better quality recommendations.
I believe test suites may contribute to this, but in terms of scoping the QA work, and in terms of setting the goals of the QA work this should be linked to the output of W3C which is a set of documents.
Thus test suites are only useful in as much as they help provide better quality recommendations.
This of course begs the question as to what are the quality metrics for a recommendation - suggestions:
The problem with setting conformance tests as the goal is that many WG members will not be committed to this goal
In more detail
For those undertaking quality assurance projects, the QA Framework should give:
- a step-by-step guide to QA process and operational setup, as well as to development;
- and, access to resources that the QA Activity makes available to assist the WGs in all aspects of their quality practices.
- neither of these points say very much, since they depend on the definition of QA, which most readers will not share, suggest delete
More specific goals include:
- to encourage the employment of quality practices in the development of the specifications from their inception through their deployment;
- again without a shared understand of quality this statement is vacuous, no-one is opposed to 'quality', but some will be opposed to the QA WG's conceptualization of quality
- to encourage development and use of a common, shared set of tools and methods for building test materials; to foster a common look-and-feel for test suites, validation tools, harnesses, and results reporting.
- this is the first point which it is possible to disagree with, and hence is the first substantive statement of goals - and it is inappropriate - the goal must be higher level than this
A problem with having conformance tests as a goal is that it is unrealistic to expect the whole of the WG to buy into it. Whereas (nearly) all (active) WG members will accept that the quality of the documents being produced is a reasonable goal for the WG. Quality is not the responsibility of a specialist subgroup but a shared responsibility of the whole WG - obviously different members of the WG will have different beliefs and opinions as to the value of testing, and will only really support test work once it has begun to show real benefit on more general measures.
It is hard to appreciate what the QAF is trying to do, since you have omitted to publish a list of requirements. Hence, the reader is left to imagine what requirements you might have been trying to meet, which is less than satisfactory.
You say
The relationship of the QA Framework documents to the W3C Process is undefined at this time of publication. It is not precluded that all or parts of these guidelines may ultimately be incorporated (by reference) into the W3C Process Document [PROCESS].
See member confidential message.
Concerning your issue issue 16, "Should the W3C Process Document be modified".
In the issue list it is said that "General feeling at Brussels seemed to be that it was unnecessary. The QA Framework will require it" This seems to suggest that you perceive the QAF as an extenstion to the process document with force over other WGs. The language of the Ops Guidelines also suggests this. If the quoted paragraph is actually your intent, that the relationship is undefined then it is important that the QAF documents allow other WGs and other recommendations to not conform. This suggests a global substitute of "Working Groups" with "Conforming Working Groups" etc.
I think the decision to leave this as undefined as inappropriate. For me, as a non QA WG sympathizer, it is crucial. If you are really not trying to impose your view of quality on me, then I would not have to bother reviewing your documents. As is, I strongly object to the apparent constraints in your documents on my activities elsewhere in the W3C.
A solution to the problem above is to identify what needs to change in the process document. I suggest that the QAWG should be seeking small changes to the process document to express the normative content of the QAF, and the QAF documents themselves should all be informative.
A strawman for the necessary change to the process document is as follows. Modify section 6.2.6 of the process document by changes the list of deliverables in the fourth bullet point from "(technical reports, reviews " to "(technical reports and test materials, reviews ".
A further possible change may be to add an additional bullet point after that fourth bullet reading "Working Groups whose deliverables include technical reports *should* also have supporting test materials amongst their deliverables."
An editorial problem with the paragraph entitled terminology is that you do not use many of the keywords that you have imported. I think it would be clearer to reduce the list to those keywords that you actually use. (On the other hand, RFC 2119 suggests text that is closer to yours, so a more pedantic comment would be that you should use the exact phrasing given in RFC 2119 - maybe the two comments balance out). I suggest changing "will be used" to "are used" or "are to be interpreted". Also note that the paragraph importing RFC 2119 (as a normative reference) is directly contradicted by the following paragraph (which denies the existence of any normative requirements in the introduction). I suggest removing the import of RFC 2119 from the Intro. Where not already included, such imports can be added into the normative documents; better still is not to have normative documents.
The section 3.1 in the intro points to the operations guidelines, presumably section 3. This does not add much to the Intro except the reference [QAWGPD] which is "under construction".
This is not a good advertisement for the quality of your work.
I find this subtracts from the documents, and suggest section 3.1 of the intro and section 3 of the ops guidelines should be deleted.
Minimally, one or other of the sections should go, since they are fairly similar and could easily be merged.
A further problem is the reference to QAF Spec Guidelines Example & Techniques. This is officially an "editors' draft" in its SOTD, but is referenced just like the QAF Op Guidelines Example & Techniques which is a WG Note according to its SOTD.
The dependency between the documents in CR and an editors draft for which there is no official publication is not appropriate.
Numbering missing on <h2> elements, e.g. Introduction
In paragraph before section 3.1 However should be Moreover or In addition
In references QAWG should not be in italics
In many ways the references are disappointing, and look like an attempt to avoid pubrules and W3C process rather than make legitimate references.
Particular references that are weak are: [TAXONOMY] , [QAWG] , [QAIG] , [QA-GLOSSARY] , [QA-LIBRARY] .
These references are:
My understanding is that Candidate Recs are completed documents being offered for implementation and review not work that is known to be incomplete.
Many of these references, or the use put to these references (e.g. [QAWG] and [QAIG]), either need to be dropped or included as appendices. This means that they should be finished before the QAF is brought back for a second last call.
I found [TAXONOMY] unintelligible.
A further disappointment in these references, particularly the QA Library, is the lack of non-W3C perspective. You should make it clear that you have considered input of the highest academic calibre from peer-reviewed sources on how to achieve quality in technical documents. Ideally you should have invited experts who wrote such papers. Maybe you do, in which case please make it clearer: one or two self references from the academic literature would not go astray.
As an aside, I tried looking for such things with google and found this document interesting: Automated Quality Analysis Of Natural Language Requirement Specifications by Wilson, Rosenberg, Hyatt. Is the work of NASA well-thought of in the field?
The references to [QAF-TEST] , [TEST-EXTECH] , are also problematic, given their unfinished nature. I cannot understand how you can bring any part of the QAF to Last Call or Candidate Rec without having completed it all.
Found at http://www.w3.org/TR/2003/CR-qaframe-ops-20030922/.
Checkpoint 3.1. Synchronize the publication of QA deliverables and the specification's drafts. [Priority 2]
While I support the WebOnt WG's more general comment about your checkpoints being too strong, I also suggest that this one is too weak. It is hard to review specifications that are released in pieces with references from one part at one level of completeness to another at another level.
I find your documents a very good example of how not to release LC and CRs. It is too difficult for the reader to make any sort of consistent cut through what is in reality a single logical publication, but which has pieces at very different levels. I also find that WG notes for the informative sections work less well than using recommendations for the informative sections. Partly this is because you seem to have allowed yourself lower editorial standards in the notes (e.g. the missing ToCs), partly that a WG note is not necessarily a consensus document, partly because the lack of LC, CR, PR checkpoints in a note make it difficult for the reader to understand what sort of review is appropriate.
I find no evidence that you fulfilled this checkpoint in your own publications. In particular I have not found test material for QAF-OPS, and the test material I have found for QAF-SPEC is too incomplete for CR, (the main content seems to be http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/qaframe-spec-ta which is test assertions rather than tests)
So I suggest that this checkpoint be reworked as:
I note that the QA WG has not followed this practice, which makes reviewing your work significantly harder.
Checkpoint 4.1 reads:
Checkpoint 4.1. Appoint a QA moderator. [Priority 1]
I suggest this checkpoint be deleted.
Working groups are already responsible for the quality of their work. It is unclear what power the QA moderator might have if there is a quality problem. WG Chairs who find the quanitity of chair's responsibility too large can already delegate specific responsibilities to other WG members.
It is unclear what rewards and motivations are offered to the QA moderator. It is unrealistic to expect the QA moderator to do the job well without appropriate recognition.
If the QA WG has appointed a QA moderator it has done so with the utmost secrecy. The checkpoint is flawed in not requiring the name and e-mail address of the QA moderator to be available on the WG homepage. If the QA WG does not have such a moderator I suggest that you have already decided that this requirement is unnecessary; your documents should be brought into line with that decision.
If you must suggest this, then at least make it clear that having co-moderators is acceptable, (cf Process doc: "Each group MUST have a Chair (or co-Chairs)"). Effectively, I shared the QA Moderator role for WebOnt
Summary: Delete, if not then: add requirement for e-mail addr, permit co-moderators.
http://www.w3.org/TR/2003/CR-qaframe-ops-20030922/qaframe-ops#Ck-TM-plans-in-charter
The detail in this requirement seems futile.
For example, I know of one WG that has committed to AAA for operations, specifications, and test materials: the QA WG. Unfortunately these commitments have not been followed through in the operations, the specifications or the test materials of that WG.
I prefer someone who promises little and delivers much.
I wonder if all that is really required is a change in the W3C culture that charters mention the expected level of test development.
Checkpoint 4.3 requires the public available of the process document. I suspect this is inadequate, and the requirement should read something like: "the Working Group's QA Process Document should be linked from the WG home page (with the same access controls)".
FYI, I found it very hard to find the QAWG QAPD, despite you having a charter commitment to its development, and having an under construction link from the QA Ops Guidelines.
I am also unconvinced of requiring WG's to have a process document, even with a "should" instead of a "MUST", I suggest a "may" is more helpful. Producing a process doc is a non-trivial cost. Process docs such as that of the QAWG that are barely followed and/or cover too many cases that do not actually occur, bring the process into disrepute. Only working groups that feel a need for a process doc should have one, and then the process doc need only cover those parts of the process which the WG believes needs clarifying.
The title of the Quality Assurance Process Doc is inappropriately broad. Judging by that of the QAWG, the focus of the document is on test materials. There are many other aspects to quality (e.g. readability and timeliness). I suggest the title of these documents should be something like "Test Development".
Since the QA WG has arrived at Candidate Rec without following the process in its process doc such documents are clearly superfluous in that group's opinion. They should not be imposed on other groups.
In many places your documents confuse QA with Test.
An example pointed out by WebOnt is the title "QA Moderator" whose responsibilities are only to do with tests.
A further example is Checkpoint 1.4 "Checkpoint 1.4. Enumerate QA deliverables and expected milestones. [Priority 1]", looking at the draft charter, you seem to only have enumerated test deliverables, and not other QA deliverables. Weaking this checkpoint so that you actually conform with it is suggested. This would change the normative force of the checkpoint.
A further place where you confuse QA and Test is in the name "QA Framework" - the framework is not a general quality framework, but only a test development framework. I suggest you rename it.
Further a global search of all your documents making a systematic check for such misuse of "QA" is required.
Guidelines 5 to 8 in particular look simply like advice as to how to realise the commitment to producing test materials.
This advice generally looks OK, but some of it is presumably over-the-top or inappropriate for some working groups.
Might it not be more effective and less pushy to seek normative status only for the charter statement concerning the production of test materials, and to leave these parts (guidelines 5 thru 8) of the framework as a resource that Working Groups may draw on, or may not. This allows the people closest to the specific problems of testing the FooBar Recommendation with maximum ability to solve their problems in the most appropriate way.
The references to the QA Test Guidelines and QA Spec Guidelines are included in section 6.2 "informative" references, but they are crucial to the understanding of the normative content of Checkpoint 1.1 "for any Test Materials that it plans to produce or adopt, the WG MUST define its commitment level to QA Framework: Test Guidelines -- A, AA, or AAA."
Since the test guidelines are unfinished it was inappropriate to advance the Ops Guidelines to last call, let alone CR.
You say: "Such a procedure is applicable both to external challenges once test materials have been published, as well as to internal challenges during the building of test materials."
It is not clear that a WG can/should constrain what the W3C does with any W3C test suites after Rec. See a comment from Dan Connolly on an early version of OWL Test Cases.
A fair amount of the informative content related to the Ops Guidelines is "under construction" - e.g. the QAWG QAPD, some of the examples and techniques document. In my judgement, the move to Last Call and Candidate Rec was premature given the quantity and importance of the unfinished work.
Aside: Contrast with WebOnt - when we went to Candidate Rec there were still some tests being added, and implementors were explicitly asked to keep up with the editors draft, these test moreover are now normative (but subsidiary to OWL Semantics); there was a significant informative document that will be published as a WG Note in January. WebOnt had a formal objection to the lack of the additional informative content, which we opposed on the grounds that four implementors had managed to implement from the terse normative document (not on the grounds that we were working on it). WebOnt also explicitly listed as "at risk", part of the design for which we still had work outstanding, as is, the implementation feedback caused us to abandon the extra work.
The point of the aside is that I don't think it is particularly clear when one crosses the boundary between being finished enough for CR and still having work to do. However, I think you are the wrong side of the grey line.
WebOnt breached this condition, tests were APPROVED or OBSOLETED or REJECTED on WG decision (That was the defined procedure). No rationale. No garantee of review of accuracy, scope, and clarity. That worked fine, there is no need to assume that WGs are stupid and will approve incorrect tests. With three tests we took an explicit risk and approved them despite not really having evidence that were correct. This was a trade off between various different quality concerns, including timeliness, and the importance of the tests. We modified one test because our machines told us to. Noone understood it as far as I could gather. There is nothing wrong with that, as long as we had adequate evidence that modifying the test was in line with what the other documents said, and hence would enhance interoperability.
I think the rationale is flawed in wanting "quick" decisions. Many OWL tests took a year to approve, it was not a problem. If we had not voted to move to PR some of those tests would still be awaiting a decision.
The WebOnt experience suggests that, the last sentence of the discussion should be weakened to "They should at least include a mechanism for acceptance or rejection of tests."
The rationale should be something like "It is helpful to distinguish those tests that have been accepted by the Working Group"
The Checkpoint could read, "Define a procedure for accepting or rejecting tests."
The conformance requirements could be … dropped, is what I would prefer.
I am not at all convinced by this discussion. Specifically:
I request that the second sentence of the discussion and the two bullet points be deleted.
Section 6.1 and 6.2 are listed as 2.1 and 2.2.
The appendices should be included in the ToC in a more conventional fashion, e.g. as a continuation of the <ul> list. I missed their existance on first reading.
Section 1.5. This part of the QAF should be boiler plate that is nearly identical in all your documents. It should probably come sooner. Without it, the reader is lost.
I wonder whether there should be a global substitution throughout the QAF replacing "specification" by "recommendation".
Documents reviewed: http://www.w3.org/TR/2002/NOTE-qaframe-ops-extech-20021202/, http://www.w3.org/QA/WG/2003/09/qaframe-ops-extech-20030922.html, http://www.w3.org/QA/WG/2003/09/OpsET-charter-20030922.html, http://www.w3.org/QA/WG/2003/09/OpsET-qapd-20030922.html.
I found it confusing having WG notes not in TR space.
This particular note violates pubrules section 1.5 point 1, in a way that also caused me significant difficulty when reviewing. (The appendices are not in the document directory, in fact, the document does not have a directory, making it hard to know where the documents end and the general QA WG scratch space begins).
Moreover the QAPD template contains broken links such as http://www.w3.org/QA/WG/2003/09/qaframe-ops.html#Ck-proc-describe-review, this causes more work for the reviewer which should have been done by the editors before Last Call (let alone CR).
My reading of that same point of pubrules is that the decision reported in the December 2002 publication to move the document into WG space was a violation.
I believe that the same affect, of having a WG copy that changes fairly frequently during development, could have been better achieved by keeping the editors' draft in WG space.
Suggest move all QA WG notes into TR space, and ensure they conform with pubrules, in fact, I think it would have been clearer to have had these as informative Rec track documents.
A further problem is that the appendices are not listed as part of the table of contents in a conventional style, again this makes it hard to know where the boundaries of the document are.
I am not happy with the normative content of either the process document template or the charter template, but that follows points I have already expressed. In addition:
Found at http://www.w3.org/TR/2003/CR-qaframe-spec-20031110/.
Does not include any Goals, should be retitled to Scope
I have found the language used quite difficult to understand as a non-expert. The fact that too much of the companion example and techniques document is still being written has made it impossible for me to adequately review some parts of this document.
An example problem is Guideline 3, the terms "modules" "profiles" etc all seem to mean something specific, I probably have not read section 2 adequately. Section 2.3 seems clear enough, but does not help me understand Guideline 3. The "under construction" banner has disconcerted me.
This of course is linked to the problem of having a CR document that depends on an editors draft for its informative content.
My understanding is further hampered by the Examples and Techniques doc not being in TR space, and the fragIDs not working in my (relatively broken) browser after the redirect.
I failed to understand checkpoint 2.4
The use of the undated link is (weakly) in contravention of your own operational guidelines (3.1) on synchronization.
While my global point that W3C publishes recs not specs stands, a specific instance is section 1.7 para "priority 2" that talks about 'standards', this is an editorial error and should change to be the same as priority 1 and 3.
I wasted some time by following this links, suggest reordering section 1.5 by starting with "Each guideline includes: .." and moving the deleted text concerning WAI and WCAG to the acknowledgement section.
As an example of a MUST that is too strong consider the second MUST in checkpoint 1.1. The "This information" referes to the most recent information in the test, i.e. "a statement of objectives and/or goals" and since we now have a "MUST appear" the previous SHOULD can never be ignored.
Moreover, since the Spec Guideline does not include any goals (the goals are in the QAF Intro, and not in the section 1.1 Scope and Goals), it is in contravention of this checkpoint. The checkpoint is flawed and should be changed.
Every Guideline in this document should be carefully reviewed to see if it respects the needs of multidocument publications such as the QAF, or RDF, or OWL.
Suggest global weakenings of all MUSTs to SHOULDs or better MAYs or better still "may"
This is inappropriate.
Editors have discretion and freedom to construct the document best suited to their audience. The inclusion of examples is a rule of thumb not something to be set in stone.
As an example the OWL Test Cases Proposed Recommendation does not meet this checkpoint, nor does it suffer from not doing so.
Many W3C recommendations define document formats.
Such recommendations may be complete, intelligible and self-contained without any definitions of products that do things with these documents.
For example, RDF defines a knowledge representation language. The range of products that could do something with such documents is large but what matters is the meaning of the document.
Often declarative definitive statements are clearer than procedural ones, whereas this checkpoint is likely to commit Working Groups to building procedurally oriented recommendations.
This is poorly thought out.
The enumerated list seems an arbitrary ad hoc list. The phrase "if no category matches, define a category." makes the preceding MUST toothless, and hence superfluous.
Suggest delete.
The list is reminiscint of the 'certain Chinese encyclopedia' which is quoted in a passage from Borges quoted in the preface to the Order of Things by Foucault: "animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in this present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies". (It's not really that bad, but this passage is quite a laugh).
In checkpoint 4.2
The MUST is too strong: any MUST which can be inapplicable SHOULD be a SHOULD. (Need global review)
Having failed to understand checkpoint 3, I get a sense of deja vu when I get to checkpoint 5.
And should not be in caps
Checkpoints 3, 4, 5 and 6 are all DoVs. Is it possible to condense them?
This is unlikely to be effectively enforceable, and hence should be dropped.
While this seems generally a good idea, it seems too vague for an RFC 2119 keyword.
The third requirement "MUST state why RFC2119 keywords were not used." is inappropriate. The previous requirement (the SHOULD) required that there was a rationale. It is a purely editorial matter whether that rationale is helpful or not to the intended reader of the recommendation.
I find Connolly's MUST is for agents compelling. I like the structure of the OWL Proposed Rec in which the bulk of the conformance requirements is specified mathematically, using squiggles (and not RFC 2119). We do tie this in with RFC 2119 keywords only when we talk about a fairly mythical agent an "OWL Consistency Checker", which is quite deliberately an idealised reasoner, and not something that we actually expect to be widely used.
In as much as Checkpoint 7.1 requires W3C recommendations to behave other than as suggested in Connolly's MUST is for agents, I oppose this checkpoint.
On the topic of Connolly's MUST is for agents he suggests:
I might change the declarative "every TR MUST be valid" into a process constraint like [...] the webmaster/comm team may decline any publication request [... for ...] tech reports that are not valid.
I suspect that whole pattern could be applied successfully to the Spec Guidelines. e.g. define a "A" conforming Spec, an "AA" conforming Spec, and an "AAA" conforming Spec. The checkpoint 1.1 becomes "Near its beginning, an A conforming Spec defines the subject matter of the specification, and normally includes a statement of its objectives and/or goals." Yes, that is significantly better.
I suggest all the guidelines be written in this declarative style; probably throughout QAF
I like this one, except the usual grumbles about MUST etc.
A winning streak - well at least the first half. It would have been good to have an ICS for OWL reasoners.
Again I grumble about MUST, and the intrusion into editorial discretion by requiring the editor to include a rationale (rather than merely to have one).
My views on this were endorsed by WebOnt, and are included in the WebOnt review
Note the Ops guidelines does not meet this checkpoint. Have you reviewed your documents against the Spec Guidelines?
I think it would be slightly better if you used the same charset for all pieces of this document.
The ICS is in UTF-8 whereas the rest appears to be in ISO-8859-1
???0.1.??? Appendices
LC-13 includes the comment "Would be better if the document were AAA-conforming to itself IMO.", and the WG agrees with "QAWG agrees that SpecGL should be AAA-conformant to SpecGL, and commits to AAA conformance by Candidate Recommendation". Within the actual CR you seem to only claim (incorrectly) AA conformance.
The process document requires that the WG has formally addressed last call issues before advancing to CR.
A specific requirement is:
The group SHOULD maintain an accurate summary of all substantive issues and responses to them (e.g., in the form of an issues list with links to mailing list archives).
You have chosen to implement this in the following way:
I found this very confusing, not least, because this structure is not made clear anywhere.
A specific problem was that your issue list does not permit me to see the response sent to Marc Hadley concerning issue 13.
Hence I am unable to ascertain whether to not you told hime one thing and did another.
The DoC page only gives the summary resolution, but does link to the Last Call Issue page on which you appear to have agreed that Spec Guidelines will be AAA before CR, (which it was not, even on your own overly optimistic assessment).
Thus I cannot tell whether Marc's silence interpreted as assent was at least formally informed by that part of your decision.
In fact, the issue list does not let me verify that you formally addressed any issues at all.
Further problems with your last call issue list include:
The division of DoCs by document is a procedural slight of hand. Many of the most important comments are in fact comments about the whole QAF. The failure to formally address them, invalidates the advance to CR of Op GL and Spec GL. (The next comment is an example)
I suggest it is better to formally address comments by:
I like that you choose to publish a consolidated version of the Spec Guidelines with the changes from the last call comments. It may have been better to have included a link to that version in your messages formally responding to the reviewers.
The Device Independent WG asked for a cost-benefit analysis.
While this issue was Accepted with Modification, it is not clear that you ever informed them of this, or informed them of the fact that the modification is that you decided to move to CR without having done the cost-benefit analysis.
This is not merely editorial, but is a precursor to a substantive comment: if these are the benefits and these are the costs then this is not worth it. This issue has been raised earlier by Jonathan Robie on the IG list. It is vital that the QAWG addresses this (i.e. produce a CBA, not plan to produce a CBA) before asking for consensus around its work.
I note that you did not formally address this comment. You had decided
to inappropriately take it as a comment on the Introduction,
rather than on all the documents. Specifically
your
advice to Roger
Your reply to this DoC document is not required
was flawed. On the basis of that advice he appears
not to have forwarded the message on to the DIWG.
Thus, the DIWG did not have a chance to consider whether
they wished to formally object or not to your not producing
a CBA before moving to CR.
The XML Protocoal WG commented that the status of the document of section did not state that it was a last call working draft.
Your last call issues list failed to list this comment, and this comment has hence not been formally addressed.
(The comment is incorrect, the reviewer seems to have been looking at an editors draft. This may be indicative of failing to make adequate distinction between editors drafts and published drafts)
In April 2003, Dan Connolly's commented on the spec GL. One of the editors responded, and Dan said, "I'm not satisfied by this response to my last call comment". This formal objection is not recorded in your last call issues page. It was not recorded in your call for implementations.
The QA WG should insure that a message goes to the AC reps informing them of this omission in the previous call for implementations. The director should be given an opportunity to consider Connolly's objection and the QAWG's inadequate response to it.
Why all the caps? Suggest lowercase throughout
Also the caps usage is inconsistent.
Impacts checkpoint 1.2
None of the QAF documents conform with this checkpoint, since it is way too narrow. In detail:
The first section of the specification contains a clause entitled "scope" AND enumerates the subject matter of the specification.
Problems with the sentence include:
Admittedly I have got a bit carried away here - but still.
It is easy to write this appendix well, the editor just needs to pick out some of the words of the main document. By trying to use different words the editor fails.
I suggest a complete rewrite of this appendix, merely quoting the relevant parts of the spec.
That seems like make work, why not drop the appendix, and drop the part of the spec that says you MUST have it, (checkpoint 10.1 I think).
Document reviewed: http://www.w3.org/QA/WG/2003/10/TestGL-20031020.html (editors draft)
The QAF is about testing, hence the Test Guidelines is the central document to the content of the framework. It is an abuse of process to have advanced the other documents to last call before this document was ready.
It is not possible to perform an adequate CR review of the OPs Guidelines or the Spec Guidelines without also performing a review of the Test Guidelines.
Guideline 1. Perform a functional analysis of the specification and determine the testing strategy to be used. In order to determine the testing strategy or strategies to be used, a high-level analysis of the structure of the specification (the subject of the test suite) must be performed. The better the initial analysis, the clearer the testing strategy will be.
Neither WebOnt nor RDFCore did this.
It is hard since the main purpose of the tests for these WG were to help in the development of a quality recommendation, and one cannot do a final functional analysis of the rec until its basically finished, which would have overly committed us to a waterfall model of development. In fact, that motivation indicates that the second sentence quoted is too strong there is no "must be performed" here, suggest "may be helpful".
Having said that, it is clear that the coverage of the tests in both the Semantic Web WGs is weaker than it would have been if we had followed this guideline at some point, this then comes back to issues to do with synchronization and timelines etc. In WebOnt I am reasonable sure that most of the untested bits are from that part of the rec that is fairly easy to implement. Thus, since we do not have a conformance test suite, the many OWL implementations that pass all the tests may nevertheless have a variety of trivial errors that prevent interoperability. I don't see that as the responsibility of the WG - conformance tests come later, and at that point (or in bug reports to software developers) it will become clear what trivial errors in software need fixing. Of course, in a very few cases these trivial errors may point to minor errors in the spec where there is insufficient clarity - but I believe that issue driven test development has covered almost all of these areas adequately.
At times, RDF Core used a test driven specification development methodology.
Issues from the issue list were resolved by agreeing test cases.
The editors then had complete freedom to write text which conformed with the test cases. (The text was later reviewed, so the freedom was not as excessive as it seems).
Examples can be found both regarding syntax and semantics.
A syntax example is rdfms-empty-property-elements which was resolved with these words:
RESOLUTION: the consolidated test cases represent RDFCore's decision on this; the issue can be closed once those test cases are all in one place.
The test cases can be found in this directory. As far as I can tell, this predates the first editor's draft of the revised grammar, and modified the old grammar. i.e. that this decision, does not follow your methodology at all, is a test-focussed decision, and was good.
A semantics example is rdfms-identity-of-statements for which the issue resolution is but a single test case:
Resolution: The RDFCore WG resolved: <stmt1> <rdf:type> <rdf:Statement> . <stmt1> <rdf:subject> <subject> . <stmt1> <rdf:predicate> <predicate> . <stmt1> <rdf:object> <object> . <stmt2> <rdf:type> <rdf:Statement> . <stmt2> <rdf:subject> <subject> . <stmt2> <rdf:predicate> <predicate> . <stmt2> <rdf:object> <object> . <stmt1> <property> <foo> . does not entail: <stmt2> <property> <foo> .
Just perfect! (The syntax used in the test case is a muddle, but all the WG members could understand it. The test case will have been sanitised into the RDF Test Cases somewhere. The RDF Semantics documents reflect this. Using test cases as part, sometimes the only part of, the issue resolution process brings clarity and openness. It leaves the editors with large discretion that permits the documents to be of higher quality.
In as much as the Test Guidelines and the QAF prohibit and/or obstruct this behaviour I suggest that the QAWG has got it wrong, and needs to start over.
Checkpoint 1.3. Analyze the structure of the specification, partition it as appropriate, and determine and document the testing approach to be used for each partition. [Priority 1]
I suggest weakening this to have "may" force rather than "must" force.
Checkpoint 2.1. Identify and list testable assertions [Priority 2] Conformance requirements: Test assertions within or derived from the specification must be identified and documented.
Checkpoint 2.2. Tag assertions with essential metadata [Priority 1]
Rationale: It must be possible to uniquely identify assertions, and to map them to a particular location, or to particular text, within the specification.
Wildly oversimplistic.
Even the simplest OWL test relies on many parts of the recommendation. The idea that it is possible to tie a test to one or two parts of the recommendation is philosophically flawed (similar to the concept of causation, cf a huge body of literature). I do not believe this is uniquely a property of OWL.
Obviously one tries to structure the tests in such a way that assuming a system passes some set of easier tests, then this new test presents an interesting challenge, but ... Of course this also amounts to the issue that you lot seem to believe that it is possible to test for conformance whereas that is trivially incorrect. (Given any set of conformance tests for any system where each test is characterised as one or more inputs resulting in one or more outputs, the piece of software that is defined to precisely pass the test suite, by giving the determined output for the determined input, and otherwise to fail horribly, is a non-conformant piece of software that passes the conformance tests).
Suggest drop these requirements, and the related ones in Guideline 10 of SpecGL.
Possibly weaken to a "It may be helpful to list the test assertions found within or derived from a recommendation"
When the Working Group requests test submissions, it must also request that the appropriate metadata be supplied.
I found it easier to completely own the test metadata in webont (well me and Jos the co-editor). Unfortunately the metadata quality is crucial and is best ensured by having a fairly small number of people responsible - sure it's a lot of work.
The *must* is too strong, suggest *may*.
The list of test metadata omits "the type of the test" and "the files associated with the test"
Conformance requirement: The test materials management process must provide coverage data. At a minimum, the percentage of assertions for which at least one test-case exists should be calculated and published.
Makework - this statistic is useless. Please do not waste other people's time in calculating it. Any test suite tests 0% of any plausible language worth specifying because the language is infinite and the test suite is finite. Any other number is simply a fib.
Suggest drop this requirement and any related requirement.
Checkpoint 3.4 Provide an issue-tracking system [Priority 2] Conformance requirements: The test materials management process must include an issue-tracking system.
Rationale: If a high-quality test suite is to developed it is important to methodically record and track problems and issues that arise during test development, testing, and use. For example, the test review process may generate a variety of issues (whether the test is necessary, appropriate, or correct), while after publication users of the test suite may assert that a particular test is incorrect. Such issues must be tracked, and their resolution recorded.
This is of course a quality issue but has nothing to do with test - suggest move to the Operational Guidelines. Every WG should have a means of issue tracking. Also this should not be normative, merely a QA resource that WGs may draw from.
Checkpoint 3.5 Automate the test materials management process [Priority 2] Conformance requirements: The test materials management process must be automated.
Rationale: Automation of the test materials management process, perhaps by providing a web-based interface to a database backend, will simplify the process of organizing, selecting, and filtering test materials.
The rationale is true but does not justify a must; the QA group could collect a set of tools that have been used to help automate test material management, and help try and spread best practice but a *must* here is ridiculous. This really should not be a checkpoint.
I note that the QAWG commits to AAA test conformance, please describe your automatic system for test material management. (Since the spec GL and the ops GL are in CR and not test GL, I would be happy with an answer that restricted itself to those two documents).
Checkpoint 4.2. Automate the test execution process [Priority 2] Conformance requirements: Test execution should be automated in a cross-platform manner. The automation system must support running a subset of tests based on various selection criteria.
Rationale: If feasible, automating the test execution process is the best way to ensure that it is repeatable and deterministic, as required by Checkpoint 4.1. If the test execution process is automated, this should be done in a cross-platform manner, so that all implementers may take advantage of the automation.
WebOnt made it clear to its implementors that we expected test results to have been collected in an automated fashion, but it is not possible for a WG to provide such an execution environment for every conceivable setup.
Once again, noting the QAWGs AAA commitments in its charter, I hope you will demonstrate the sense of this checkpoint before any of your documents proceed further along the recommendation track. I guess you need to solve some natural language research problems first.
Checkpoint 5.1 Review the test materials [Priority 1] Conformance requirements: The test materials must be reviewed to ensure that they meet the submission requirements. The status of the review must be recorded in the test materials management system, as discussed in Checkpoint 3.2 above.
You cannot have a priority 1 depending on a priority 2, I think the "management system" is the problem replace with "metadata".
In WebOnt we automated this part - every time the OWL Test Cases document is produced all the test material is verified to conform with the "stylistic" guidelines in OWL Test. Hence we meet the spirit of this without meeting the letter. Once again, your desire to have strong wording is inappropriate. Weaker wording that would be acceptable would be:
Checkpoint 5.1 Review the test materials [Priority 1] Conformance requirements: The test materials should be reviewed to ensure that they meet the submission requirements. The status of the review may be recorded in the test materials metadata, as discussed in Checkpoint 3.2 above.
On the same checkpoint, I note that one test which I accepted, imports-014, had as its whole point that it did not conform to the stylistic preferences (using a superfluous suffix on a URI) and that this presented problems which were not exercised by the other tests.
So, it is important that there is adequate discretion in the process to accept tests that do not meet the submission requirements.
Discussion: It is not necessary for tests to automatically report status for this checkpoint to be met. It would be sufficient, in the case of manually executed tests, for the test execution procedure to unambiguously define how the person executing the tests should determine the test execution status.
Tell me again about the QA WG's tests (for opsGL and specsGL) that permit unambiguous determination of the test execution status, I seemed to have missed that part of your document set.
Checkpoint 6.2 Tests should report diagnostic information [Priority 2] Conformance requirements: When tests fail, they must provide diagnostic information to assist the implementer in determining the source of the problem.
No!!
It is a huge amount of work for the WG to provide the implementors free of charge with a test suite. No way are the implementors entitled to a test suite with diagnostics. The cost is huge - developers get paid, they should put some sweat in, too.
I look forward to seeing the QAWG's diagnostics in the test suite for opsGL and specsGL.
This requirement is mad, and should go.
I made the mistake of following this link from the SOTD of the QAF Spec Guidelines.
I suggest that the sentence "In addition, the QA WG has had a pro-active role by making and contributing ..." be deleted, because of the inappropriate nature of the material at the other end of the link. http://www.w3.org/QA/Group/2002/06/reviews. That page is a matrix of assignment by QA WG member and is not appropriate for review - which should be organized in some other way, e.g. by recommendation or status. I found further problems when looking at the RDF reviews. These were: firstly that they were news to me, as an active member of the RDF Core WG I would expect a review to have been circulated. Second the "done instead .. QT" makes no sense to me. These problems can be solved by simply deleting the link.
A more serious problem is that the CSS3-UI review has a "XHTML 1.0" tick box, when it isn't (it has a charset problem). I hope the embarassment of this suggests doing a systematic check of the QA web space for similar problems.
The CSS validator reports problems with the stylesheet used by the QA WG (for instance for the CSS3-UI review). This is less serious since you don't claim CSS conformance.
A further issue I had with the QA pages is that the template seems to fix the copyright dates inappropriately. In particular, few, if any, of your pages date back to 2000. Again, some effort needs to be expended on globally fixing this.
This is under construction, but is crucial as to whether you conform with your charter commitment to quality.
Document reviewed: http://www.w3.org/QA/WG/2003/03/QAWG_QAProcess-20030307.html
Here are some of the problems:
I think it would be valuable to keep cycling the QA review of QA until the QA goal posts are ones you have met - partly this is about upping the quality of your work, partly this is about making the statement of ambition of the QA goalposts more reasonable, and partly it is about dropping irrelevant goals from the QA framework.
Here are some of the ops guidelines that are suspect:
Document considered: http://www.w3.org/QA/WG/charter
The WG has failed to meet the quality commitments in its charter. Specifically the advance to CR should have been linked with a clear statement of what had been achieved towards the AAA goals and the plan for achieving the other goals by PR.
It is not possible to recover the failure to meet AAA operational goals without reverting to last call or before.
In particular I suggest that the WG should not request Proposed Recommendation before they have achieved and documented AAA conformance in all respects. I note that this adds to the PR entrance criteria implicit in the CR requests.
The first paragraph talks about of section 2 (Scope and Deliverables) "usable and useful" test suites, the second about "conformance testing". A fundamental problem with the QAWG is the conflation of these two different concepts.
In order to have any credibility the QAWG needs to demonstrate that it meets its own standards. The resolution of LC-13 reads "QAWG agrees that SpecGL should be AAA-conformant to SpecGL, and commits to AAA conformance by Candidate Recommendation. " a further valueless commitment by the QAWG. It is imperative that the QAWG demonstrates that it is committed to the quality of its own work. This is not a request for a further empty commitment.
As an example, the WG does not appear to have
addressed the point found in
your own review
that the doc should state explicitely that the SHOULD and MAY can be implemented separately
.
I suggest that an appropriate goal is something like higher quality TR publications and then working out what needs to be done to achieve this.
It is not clear that the emphasis on testing is appropriate.
As a specific example, I suspect that suggesting that including a requirements document amongst the WG deliverables is a good idea.
Having done a requirements gathering phase it is useful to try and link the recommendations, particularly the MUSTs and SHOULDs back to those requirements. Any recommendations that do not link to the requirements should be considered for deletion.
Conforming with your suggested requirements is a nontrivial exercise that will impose significant cost on W3C WGs, Editors, and Members.
It appears to be an act of faith on your part that these costs are appropriate.
I suggest that minor changes to the process document will achieve almost all the benefits at a small fraction of the cost.
I wish to give two examples, RDF Model and Syntax and XLink.
RDF Model and Syntax was a muddle, produced fairly rapidly. The wording was so unclear that a new WG had to be chartered to clear it up. That is a significant cost.
XLink is a clear spec, which took a long time to produce.
RDF is used, perhaps less than hoped.
XLink is not
While RDF M&S had quality problems, the achievement of timeliness may well have been a critically correct decision.
XLink may be clear, but for one reason or another the recommendation is not useful.
Further thoughts are that the delays imposed by the constraints of the QA Framework may have other costs such as the establishment of de facto standards that are worse than what the WG would have produced without the QAF constraints.
Good advice about how to decide where to spend the available effort to improve the quality of recommendations needs to be informed. The current QAF documents appear to be simply dogmatic about the importance of testing, and blinkered as to the value of other quality parameters.
I have no idea how to tell whether or not there are tests for your CR documents. I have to resort to google.
RDF Core and WebOnt both decided to publish their tests as a rec track doc.
Here are some advantages of that decision:
Here are some disadvantages
In any case the QA documents should suggest that rec track documents have clear and straightforward links to the relevant test suites.
You do not follow RFC 2119 section 6. This is a serious failing given the wide use you make of that RFC. To refresh your memories:
6. Guidance in the use of these Imperatives Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions) For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability.
The MUST is the only use of an RFC 2119 key word as a key word within RFC 2119. I believe that every usage of RFC key words in the QAF is in breach of this constraint. I have already drawn to your attention many of the most suspect examples, but I have not found one that is clearly within this constraint (I think that a case could be made for about one-fifth of the constraints, e.g. requiring that the QAPD,if any, be public; requiring a public archived mailing list, but these are on the fringe of the subject matter of QAF). I suggest that the subject matter of QAF is more appropriately informative rather than normative, since it concerns "a particular method on implementors" (i.e. Working Groups and Recommendation editors) for achieving clarity and quality from which interoperability arises. Other methods are as good, and the QAWG must not use RFC 2119 key words to impose.
Note that elsewhere I have suggested replacing MUSTs with SHOULDs or MAYs. These too would be in breach, and really dropping normativity is the only sensible route that is in keeping with the judgement of the RFC 2119 authors of the importance of humility.
I am deeply concerned by your decision not to address these problems when you became aware of them (see Susan Lesch's comment of 14th March 2003, which refers to earlier dicsussion), and your minutes of April 21 "questioning our use of these terms". See also Dan Connolly's unaddressed comment. If you feel you can ignore 100% of the MUSTs in RFC 2119, why might you expect anyone to respect any of the MUSTs that you state?
Perhaps your guidelines and checkpoints should have a letter before so it is clear which document they come from (e.g. Guideline T.1)
I am not a member of the QA Working Group, since I have no desire to work on Quality Assurance. The problems found in this review are basic. They should have been found by:
Presenting grossly unfinished and inadequate work for CR review forces W3C members and participants who do not wish to contribute to the tasks of the QA WG to do your work for you. Given the menaces implicit in the MUSTs you wish to impose on other WGs this amounts to an act of extortion.
Your WG knows that it is trying to do too much with too few people who have too many other commitments.
I suggest that as well as trying to recruit new members that you should also reduce the ambition of your WG.
I believe that producing a few informative resources rather than wide ranging normative recommendations will be a better match for the actual effort the WG participants appear to have for this work.
Occasional interchange on the QAIG list suggests that the stance taken by the QAWG is not a W3C consenus and there is support within the W3C for the attitude taken in this critique.
The QAWG either has to try a whole lot harder to build a true W3C consensus or give up. Given that many of those who disagree with you basically take a "If it ain't broke, don't fix it" attitude, you have a lot to do to engage sufficient positive energy from them. I, for one, do not see it as worth my time to work on QA within W3C.
You should take the proactive step of converting conversations of disquiet on the QAIG list into issues on your issue list, and make sure that you formally address them to the satisfaction of naysayers.
As an example, Jonathan Robie
says
I am concerned about anything that would increase the work load or the
constraints on editors.
This concern is not reflected in your issues lists.
I am far from convinced that the WG sees its task as seeking a W3C consensus, rather than imposing your views on others.
I hope that my review has already persuaded you that you need to return to last call (or earlier).
If not, I suspect it will be helpful to you to know which I would intend to pursue through to formal objection.
All these comments are formal comments against all your documents. i.e. to advance any of your documents I wish you to formally address the comments below which can be applied to all of your documents.
These are:
In addition, I would make three other formal objections.
The first is that it is inappropriate to advance a specific document in the QAF without advancing them all. Thus if any document is not ready, including appendices to informative annexes, I object to the advance of any other document. A non-exhaustive list of earlier points illustrating this comment is as follows:
The second is that the amount of change since the first last call is sufficient to necessitate a second last call (this is here partly in case you accept all my other comments).
The last is a combination of all the other points to do with your breaches of the process document, pubrules etc. I believe I have documented enough examples above. I consider comment 8.5 alone to be a fatal error of this sort. In addition the quantity and importance of more minor quality errors is unacceptable. A non-exhaustive list of earlier points illustrating this comment is as follows: