- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 17 Jul 2003 17:26:47 +0100
- To: Karl Dubost <karl@w3.org>
- CC: www-qa@w3.org
Thanks for your detail reply, some inline comments, I think it shows that I have not read all your documents, which is of course a difficulty many W3C groups share - we produce so much that people may get the wrong end of the stick by using a different reading path through the documents than that envisaged. ... Karl Dubost wrote: > Hi Jeremy, > > > Let's try to nail down that a bit :) > ... I was in a bad mood when I wrote this review ... it's been too hot here ... I probably should have gone into the office where there is air conditioning. > At 17:13 +0100 2003-07-01, Jeremy Carroll wrote: > >> I did not find this document helpful. >> I do not much believe this review will be helpful either. > > > Your comments will be helpful for us at least. The fact is that if every > people say that it will not be useful, we will not be able to produce a > *useful* document for WGs. And if it's not useful, it will not be used. > > So thank you again. > >> I also felt that your document missed insight into how quality can be >> assured, and how the quality of the Test work of a WG can be enhanced. > > > :) It might be possible but we will try to explain why the document has > its shape. The Test Guidelines Document in the QA Framework is the > youngest, it has only a few months and it's really taking shape now. > It's also why we often don't do strict review of Test materials inside > WG for now against the Test Guidelines, because the document is still in > its infancy. > >> Specific issues which were not addressed were: >> >> >> a) how to effectively integrate the test work with the rest of the >> work of the WG. In particular your documents appear to follow a >> waterfall model of specification development: >> - a specification is written >> - testable assertions are added to it >> - metadata is added to those testable assertions >> - tests are developed for those testable assertions >> etc. etc. >> >> Each step follows the previous step. > > > No. We do not recommend that, but maybe the Guidelines are not well > written enough and we will have to improve because it seems you have > misunderstood them. > > * QA Framework Operational guidelines > - In Motivation and expected benefits > http://www.w3.org/TR/2003/WD-qaframe-ops-20030210/#motivation > "The principal factor for improving the quality > of implementation is early availability of > conformance test materials." > > "This makes sense, since it is natural for test suites > and implementations to develop in parallel - each is > a help to the development of the other. " > > and > "Moreover, many Working Groups have already established > procedures, techniques and tools for developing test > materials (e.g., Document Object Model - DOM). It > makes sense to capitalize on what has already been > done and share that with those who are starting out > and those who are already in the process of > developing conformance materials." > > > I added this one because you make a comment that the Test Guidelines > didn't reflect the way the W3C is working. We tried to gather all the > past successful experiences together, which means: > - SVG work > - DOM work > - CSS work > - UAAG work > > > and still in Ops Guideline > > Guideline 1 > "The quality of specifications and the early availability > of conformance test materials (TM) are significant > contributors to the quality of implementations." > > and > > Guideline 3 > "The benefits of starting synchronization of the > specification and test materials development as > ***early as possible*** include: > > * it gives an extensive list of use cases; > * it identifies ambiguities in the specification > at the ***early stages***, which saves cycles in > the late phases; > * it provides clear set of testable assertions -- > skeleton of the specification -- which in its turn > facilitates development of interoperable implementations. > The latter is a W3C process criterion for entering the > Candidate Recommendation phase." > and > > "Chekpoint 3.1 Synchronize the publication of > QA deliverables and the specification's drafts. [Priority 2] > > Rationale. Because each version of the specification > -- WDs, CR(s), PR, etc -- is a changed document, > therefore all important dependencies such as test > materials need to be updated concurrently." > > "Examples of QA deliverables might range from a > TS (test suite) production schedule in early WDs, > to TS design document in later WDs, to a first public > TS release at CR." > > and many other citations in the document... > > *** We do encourage people to start to make test at the start *** not a > waterfall model at all. > I looked again at the other docs ... there is also plenty of material in them that leads the reader into a waterfall type view of test ... e.g. In http://www.w3.org/TR/2003/WD-qaframe-ops-20030210/ level 2 [[ Summary. In addition to the previous level, Working Group (WG) provides a set of test assertions, not necessarily complete, before beginning development of a test materials. ]] level 6: [[ Summary. In addition to the previous level, a Working Group ( WG) insists on a complete test suite before a specification becomes Recommendation. ]] There are fairly numerous explicit and implicit comments that suggest a particular process in which tests are developed, which seems to follow after the initial development of the specs - rather than the XP model in which the tests are agreed and then specs are written which conform to the tests. (It would clearly be foolish to mandate an XP model). As an example from the test document: [[ Guideline 1. Perform a functional analysis of the specification and determine the testing strategy to be used. ]] This is clearly intended as an early stage, whereas in an XP model the specification would be derived from agreed tests not the other way round. > BUT we do know also how some WGs are organized and if we only propose > one model (for example XP) we will fall in the caveats where people will > find it constraining for their own WGs. > Agree wholeheartedly - I am often very struck by the differences between the two WGs I am on. The model used by WebOnt has been much more monolithic and les dynamic than that by RDF Core. > So we are trying to be a framework with variable geometry to be suitable > to many kind of technologies. > > >> A more exciting model of test is one which follows the extreme >> programming model (XP). > > > A prelimary comment about XP programming. Some of the people who have > participated to the building of the SVG Test Suite are early adopter of > XP Programming and they used often this techniques. So XP Programming is > not an unknown for the QA WG. It's even I would say a way to work. > > For example when we are writing a guideline or a checkpoint we are > trying to test it against the specs we are reading AND against the > Guideline itself, which makes it quite complicated but show issues. > > >> In particular, a WG usually has some sort of document as input(e.g. a >> member submission or a previous version of the spec). A test driven >> spec development would use tests that ask questions that the input >> document does not answer unambiguously. For example, much of the >> interest in the WebOnt work arose from the Patel-Schneider paradox >> [1], which is essentially a test case on the DAML+OIL member >> submission [2] that was the key input to webont. > > > Nothing to say about that, because I think all the people in the QA WG > agree with you. > Both RDF Core and WebOnt have followed an issue driven process. From a test perspective I have preferred RDF Core's model viz: + Issue is raised, often with a skeleton test case already in statment of issue. + Issue is discussed, typically with variants of the test case forming the substance of the discussion e.g. X is ... Y is ... some WG members say X entails Y some WG members say X does not entail Y + Issue is resolved possibly: WG resolves X entails Y ACTION editors to make appropriate changes (very XP here - the WG decision concerns the test cases, then the work of changing the documents to make it so is a mere editorial detail) The test cases for the issue are already finished before the other documents are written. The WebOnt WG was (IMO) resistant to this XP model. Thus sometimes the test cases arise informally during the discussion of the issues, but the WG process is more document centred than test case centred. There is significant overlap between the two groups. >> WGs that are not following a waterfall model will not be able to >> participate in your guidelines.. > > > that's not true. Read above. I think you are mixing two concepts or even > three. Highly likely .... > > The QA Process -> Ops Guidelines > The Quality of the Spec -> Spec Guidelines > The Quality of the Test -> Test Guidelines > > Organization like an XP model is possible, Cf Ops Guidelines because > it's the QA Process itself. Nobody forbids you to organize your work > like that, we even encourage it as I showed you before. > >> b) Ensuring that quality people develop the tests >> >> ISO 9000 style quality is an attempt to get mediocre, unmotivated >> people to do OK work of a predictable (but low) quality because some >> power structure (typically connected with money and wage-slavery) sits >> above them, and forces them to jump through some hoops. As far as I >> can tell, the test guidelines follow this paradigm. Here is a >> checklist, and even a not particularly competent WG might get adequate >> results by following these guidelines. >> >> W3C WGs depend on volunteers over whom the W3C has little actual power. >> Hence using a methodology that depends on the ability to punish and >> reward is futile. >> >> Moreover, web standards have depended in part on getting brilliant >> people to participate. It is an elitist exercise. >> This explains for example, the director's dictatorial powers. >> >> Test work is often boring, particularly when placed in a waterfall >> style model, and hence does not engage the minds of brilliant people. >> Strengthening the process for test development is counterproductive >> because it will make test more boring and reduce the attractiveness to >> brilliant people. >> >> It is important to identify what motivates the best people to >> participate in the W3C and what demotivates them. It is at least >> plausible that peer group acclaim is important, and hence it is >> important that WG members who contribute to the test work are >> adequately acknowledged. >> >> This suggests a priority 1 guideline that there should be a >> recommendation track document for the test work in a WG. A good choice >> of editor who takes pride in their work, will be one of the best ways >> to ensure a deep quality to the test work. > > > I do not agree with that. I'm an Editor of a small document too. Being > an editor in the Rec Track doesn't give benefits at all but trouble. > When people know about the tasks, they run away. > That's a good point - I guess it depends on the person. The point I stand by is that quality arises by having the right people do the work, (including issues of team dynamic), and giving them both sufficient freedom and sufficient form. Another thought is that some of the most important tests in the OWL test suite are those that come from Patel-Schneider and Horrocks. I do not believe that either has ever come anywhere close to following the submission guidelines (although Horrocks' colleague Bechhofer has supplied a huge number which did). Aiming at a quality test suite meant that someone less brilliant (me) had to be prepared to do the donkey work to take their sketches from the e-mail lists and turn them into executable tests. I accept that my initial point of rec track test documents is incorrect but that the wider point of understanding the dynamics of the WG and ensuring that the key members are empowered and motivated to contributed tests is more important than a document driven process. Achieving quality output is a management goal. One school favours documented processes to achieve predictable quality; another would see quality as resulting from high performing teams. I believe that the first school is not applicable to the production of recommendations. > As a counter example Daniel Glazman and Tantek Celik are well known for > their strong participation in the CSS Test Suite and it's not in the Rec > Track. > > I do agree that the work must be rewarded in a way, but the Rec Track > has nothing to do with that. Do you remember the name of the editors of > HTML 4.01 or/and HTML 3.2 out of the blue just like that ? > Personally no - I guess Raggett had something to do with 4.01 and Connolly with 3.2, but I am sure I've done injustices there - I know how to look it up; and I am sure that the editors list their achievements in appropriate places. > >> It may be appropriate that each test is individually attributed. > > > Agreed. In the Ops Guidelines we don't have a recommended format for the > test, but we recommend to Plan test materials development (Guideline 5). > We ask to define a contribution process. It's up to the working group to > defien how each test must be marked and with each information. > > The CSS WG has written a guide for the test > http://www.w3.org/Style/CSS/Test/testsuitedocumentation.html > http://www.w3.org/Style/CSS/Test/guidelines.html > > The WG can decide which information must be inside a Test. > > >> This relates to the previous point in that if >> we see tests as driving the development process rather than following, >> it is more likely that good people in the WG will engage in that process. >> >> As it is, the waterfall model you suggest looks like makework for >> mediocre members of the WG >> >> Of course, process guidelines are useful for brilliant people, but >> they need to have a light touch, and be life enhancing rather than >> deadening. > > > XP Programming is a process and you can write Guidelines that are tied > to the method. The fact is that the QA Framework doesn't forbid you to > use the XP Method to achieve your test suite and/or your spec. > Of course, if it is a declarative statement of the output rather than a description of the process of the WG. But it seems to be more the latter than the former ... It is of course necessary that in some way the tests are related to the spec. The QA test document seems to presuppose the existence of the spec, and derive the tests via an analysis of the spec; and approve the tests based on their accuracy. The RDF tests are related to the issues in the issue list, and not to any of the specs. The interaction between the tests and the text in the specs is two way, often from the test to the spec rather than the other way round. The functional analysis could perhaps be done afterwards, but in fact the WG is out of energy and has spent its test budget effectively. > >> c) timeliness of tests >> >> While in an ideal world recs would wait for a test suite to be >> developed, and there are adequate volunteers from within the WG to >> develop the tests, this may well not be the case. >> >> It is important to prioritise which tests are the most useful, and >> which are less useful. The issue driven test process, in which test >> cases often form part of an issue resolution, used in particular by >> RDF Core WG; is an excellent way of doing this. >> >> d) cost effectiveness of tests >> >> A W3C WG has limited resources. > > > Yes and it's often why some WG choose to develop, *unfortunately* the > test suite after the Spec. > >> Developing conformance test suites may be a waste of them. > > > I think you put a definition in Conformance that is too strong. Do you > think Certification when you read conformance ? Because it's not the > case at all. > Please clarify: I would see an issue based test suite as one derived primarily from where people have been confused A conformance based test suite as one that attempts 100% coverage of individual featutes A certification service may need more tests, aiming in addition at coverage of cases where features have unfortunate interactions with each other. > >> It is necessary for a WG to develop a clear specification, many of the >> tests necessary for conformance testing are obvious. The tests that >> are less obvious will be those that come up in an issue driven process >> of specification development, and these should command the majority of >> WG resource. >> >> [1] >> http://lists.w3.org/Archives/Public/www-webont-wg/2002Jan/0099 >> >> [2] >> http://www.daml.org/2001/03/daml+oil-index.html >> >> >> Detailed review comments: > > > This document ? > http://www.w3.org/TR/2003/WD-qaframe-test-20030516/ > Yes >> 1.2 >> >> You scope the guidelines to conformance tests - however there are >> other reasons why WG may wish to develop tests. >> >> In particular both RDF Core and WebOnt WGs have had issue driven test >> processes, where the proposed tests help clarify different positions >> between members of the WG, and the approved tests clarify the issue >> resolution. Parts of the specs that had no related issues are >> typically unproblematic, and building tests for those parts is less >> critical, and less cost effective. >> >> Hence the phrase: >> "However they are applicable to all W3C Working Group members as well >> as the testers of implementations." >> >> makes your test guidelines to be significantly more important than >> they are, and is frankly false. > > > > Read in the context of the whole QA Framework. > >> "Working Groups who may already be doing some of these activities >> should review the document and incorporate the principles and >> guidelines into their test materials as much as possible." >> I have no intention of following this suggestion. It should be toned >> down. >> >> at least to "... as much as they see fit." > > > I think the problem is the way you read the Test Guidelines (Early Stage > of the spec). > Often we had the same comments for Spec Guidelines and people were > saying it doesn't apply to us. And when I have checked with them, they > were already conforming to most of the points and a few minors > modifications was making them fully compliant. > It's a question of interpretation and learning curve. > > We do agree that our wording is sometimes difficult. We try to make our > best to make it clearer. It's why it's very important to have a QA > contact inside your WG, because it helps to do that work of > interpretation and to explain it in terms the WG knows, because each > social community and network have their own language. > > When I have read Ontologies Framework I had a very hard time to > understand it, because of the presentation and the language used and the > topic sometimes. It's normal. > The Semantics and Abstract Syntax document is virtually unreadable even to many WG members - the other documents are intended to be more introductory. >> 1.3 >> The two paragraphs are in tension with one another. >> The Semantic Web groups have followed the first paragraph but not the >> second. >> >> We have explicitly decided not to produce conformance test suites, but >> have endorsed test cases as a means for enhancing the quality of our >> documents. > > > Explain what a conformance Test Suite is for you? > See above - near 100% coverage of principle features. >> It is a shame that your work has only explored the second of these two >> paragraphs. >> >> 1.4 >> "The guidelines are intended for all Working Groups " >> Does the QA WG follow these guidelines? >> I doubt it. >> Neither does the RDF Core WG nor the WebONt WG. >> I suggest you reduce the ambition of this document. > > > We are doing and we try to do. It's one of the most important test we > make "Eat your own dog food". > I explained it before. > >> 1.5 >> >> I think you should not reference WAI in this section, maybe the >> acknowledgements. >> I found it distracting, and had to go away and look at the WAI stuff, >> when in fact it was irrelevant, and merely of historical interest. > > > > It's an example and an illustration to explain how it's organized. Your > issue will be certainly recorded. > There may be possible wording changes indicating that this is not necessary to understand the flow here. >> 1.6 >> "Satisfying these checkpoints is a basic requirement to ensure quality >> and interoperability of the standard. " >> >> Satisfying these checkpoints are neither necessary nor sufficient for >> the quality and interoperability of the standard. This untrue >> statement needs to be made less ambitious. > > > Where is the truth? I was being pedantic contrast with "Satisfying these checkpoints is a basic requirement and will help the quality and interoperability of the standard." While I am still unconvinced, I can't simply fault your logic. The sentence as written suggests I need only find one standard of high quality and interoperability that does not satisfy these checkpoints to demonstrate the falsity of your document. & the first standard that satisfies these checkpoints and is of low quality (this is clearly possible) also will falsify the current statement. > Experience? Theory? If I follow your rationale since the start. You > advocate for XP Programming, XP is all about practical and realistic > experience. The guidelines have been written by looking at what people > have done in the past at W3C (DOM, SVG, XML, CSS, OWL, etc). Priority 1 > is the minimum we ask. > > >> Checkpoint 1.1 >> MUST >> >> who must do this? >> A test suite sits on a hard disk somewhere. It does not have the >> capabilites to respond to this command. Suggest replace "MUST define" >> by "defines" >> cf Connolly's must is for agents >> http://www.w3.org/2001/01/mp23 > > > The sentence has to be reformulated. > > >> Checkpoint 1.2 >> >> How wholly unrealistic. >> Any W3C draft depends on IP, and TCP, and HTTP, ... >> There is dependency on the laws of mathematics ... >> >> This checkpoint is a red herring. >> When lowerlevels specs are unclear, the WG may or may not be aware of >> that, and may or may not be able to clarify appropriate behaviour in >> their spec, which may or may not be reflected in the test suite. >> >> It is partly your obsession with testing conformance of >> implementations than using the test material to ensure high quality >> recs that is at fault here. >> >> Obviously if I have an OWL implementation that uses an inadequate IP >> stack it won't work; and from a conformance testing point of view that >> might be relevant. From the point of view of defining OWL it is not >> interesting. > > > We are talking about immediate dependencies with regards to the technology. > For example HTML and Accessibility are very dependant, they are > interelated technologies > I often feel somewhat out on a limb on this one ... I looked at the CSS test document you pointed to above - it clearly states that the tests *do not* try and exercise the other technologies but make minimal assumptions. I tend to feel that if the other technogies are needed then the test suite should be built on the assumption that correct implementations of those technologies are being used. In RDF Core we have a relevant set of tests to do with the interaction between RDF/XML and xml:base and the relative URI resolution algorithm from RFC 2396. In my opinion, with which I managed to persuade the WG, this set of interactions was sufficiently important to have a fairly thorough set of tests, essentially exercising points of the other recommendations. On some other issues other members of the group who wished to concentrate on testing RDF and not the technologies it depended on were more persuasive. The xml:base test cases became the foundation of an extended discussion between the RDF community and the URI community, with a number of changes going into RFC 2396bis as a result. A particularly contentious test case within RDF Core is one which indicates that RDF implementations must use an implementation of URI resolution which fixes a bug in RFC 2396. I feel that these tests hence have added to quality in that they revealed ambiguity and divergences within and between standards, which are hence being rectified. > >> Checkpoint 1.3. >> ... >> must be ... documented >> >> Yawn. Requiring documentation is the instrument of mediocrity. >> Obviously it is helpful when a test has gone wrong to know which part >> of the spec it relates to, but that is nearly always self explanatory >> (the test has the foobar element in it, look at the definition of the >> foobar element in the spec) or it is not meaningful to document it. >> The patel-schneider paradox arises from a complex interaction between >> much of the spec. >> This checkpoint mandates a waterfall model for test development, >> rather than an interactive rapid prototyping model which is what is >> actually used by WGs who publish WDs every three months as mandated by >> W3C process. > > > Nobody forbids you to have a test in development, to test something, to > write the spec, to test it again, and to publish the test as stable with > regards to the spec. Chicken and eggs problem. Nobody predates the other > one. Both are working together. I remain unconvinced as to the value of documenting the mapping between logical partitions of the test suite and sections of the specification. > >> >> Checkpoint 2.2. >> >> Dull, dull, dull. This won't get done, and it is stupid to pretend >> that it will. > > > So do you prefer to make test without metadata, making it difficult to > use by anyone except the tester who has designed it and his immediate > proximity? No not at all - all tests in the semantic web activity have associated metadata - but we have no test assertions in the specifications at all. What is dull is going into excessive levels of detail documenting the relationship between the recommendation and the tests. > >> Guideline 3. >> >> This stuff about a test management system and test framework is really >> quite unclear. >> >> It seems that you have in mind a piece of software, whereas a test >> suite is, in my mind at least, a set of declarative statements. >> Particularly when we are talking about document formats (which is the >> typical case for W3C). >> >> Many of the guidelines about a test management system are met by the >> OWL Test Cases simply by being a well structured document with a >> website behind it. > > > bizarre it seems you have a Test management system ??? > http://www.w3.org/2002/03owlt/Makefile > No - I've little idea what that Makefile does > >> For example, you can use grep to look though the single html file >> version of the OWL test cases to satisfy checkpoint 3.3. >> (This can be met in more sophisticated ways, but it is the WGs job to >> provide the metadata now to provide a software tool to manipulate it). >> >> Checkpoint 3.4 would be better stated by requiring every test to have >> a unique URI > > > ? Checkpoint 3.4. Test management system must support results. [Priority 2] > Conformance requirements: The test manageement system must allow for the > results of a test case to be associated with the test. > > It's about test Results > Yes - to state results you need to be able to identify the tests - that is a signigicant part of the problem solved. The typical report is: my system passed: URI1 URI2 URI3 my system cannot attempt URI4 URI5 my system failed URI6 If the tests do not have URIs then these statements can become ambiguous e.g. some of the files used in an RDF test may be the same as those used in a different RDF test - hence the test itself needs an identity distinct from the files that make up the test. (I realise that EARL does rather better than this, but having a URI for each test is the prerequisite) > >> >> Checkpoint 4.1 is wrong and should be deleted. This is entirely out of >> scope for a W3C WG. > cf: http://lists.w3.org/Archives/Member/w3c-i18n-wg/2003Apr/0002.html (member only link) On issue C035, Recommendations and W3C Policy [[ RESOLVED: Rec track documents can ... ]] (Sorry since this is a public list I am not sure what I may or may not copy from that member only message, you will need to go and look). I believe it impacts all your documents. I would be interested in continuing this part of the discussion on a member only list e.g. w3c-archive > > ? > >> The rationale for checkpoint 5.3 is flawed. >> Implementators need to know their tests pass or fail, it is not part >> of a test suite, certainly not at priority 1, to help them fix the >> problems when the tests fail. > > > It's not part of the TS as you said and as it is written in the > guidelines. > CP 5.3 ***Results reporting framework*** must indicate result status > of each test. > My comment was more directed at the "should also" and the third and fourth sentences of the rationale. > > >> Checkpoint 6.1 >> Why are you obsessed with writing documents. > > > It's like Spec at W3C. They are not necessary to make the technology > working, but they help a lot when other people wants to understand the > technology and implement it. > In WebOnt, the chair had a plan to engage vendors (in January 2003) - he might or might not have documented it - it does not matter, since that plan failed, because the timing was premature. We need to know that that plan failed and we need to keep track of the level of engagement of vendors - but the level of documentation here is really only WebOnt's business. >> Who cares whether a plan was documented what matters is whether >> vendors were engaged or not. >> They will be engaged if: >> - the tests help them develop products >> - the products can be sold >> >> any other activity to achieve the goal here is makework. >> >> >> I am sorry that I haven't had a good word to say about this. >> I guess I should point out that this opinionated diatribe is my own, >> and not endorsed by HP or by any of the WGs I am in. > > > :))) I think you do a valuable work inside OWL WG, the OWL Test Suite is > impressive, and I'm pretty sure even if you don't think it's possible > you were already complying to many point of the guidelines. > I guess I am, although there are many I don't conform with. > In a dialog with a group, there are always misunderstandings, which is > natural. For example, Sandro explain what I have missed for my review of > OWL against Spec Guidelines and TOGETHER, we have found solutions. > > Jeremy, we are in the same ship, W3C, it's why we are all trying to > improve our work. As you said it's better to encourage and drive a > better development than to dump it. :) > > I hope the next versions of the Test Suite Guidelines will address some > of your concerns, and that some of your questions have been solved. > thanks Jeremy >
Received on Thursday, 17 July 2003 12:36:55 UTC