- From: Dimitris Dimitriadis <dimitris@ontologicon.com>
- Date: Wed, 17 Oct 2001 19:51:40 +0300
- To: "Sean B. Palmer" <sean@mysterylights.com>
- Cc: <w3c-wai-er-ig@w3.org>, <www-qa@w3.org>, <www-dom-ts@w3.org>
- Message-Id: <200110171654.f9HGsVU32583@mail.24-7webhosting.com>
Sean, thanks for your posting. It raises a series of issues that are indeed relevant to all three groups. To begin with, sorry if my reply seems a bit misinformed; I've been away for a while and I have a tough time bringing things back to normal status. Generally speaking, I am concerned that things tend to get a bit mixed up. When we originally started thinking about what a test case is, and how it should be represented, we had these goals in mind [and here I refrain from including pointers to relevant postings, if anyone wants the original source, please email me] 1. The description language should not bias particular language bindings (especially relevant to the DOM, as it is language-neutral). We settled for expressing each test case in XML, using mechanisms explained below 2. The description language should, if possible, be generated and not authored (this perhaps only pushes the error source a bit ruther back and puts specification authors under severe stress). We settled for generating the DOM TS (Test Suite) ML directly from the DOM specification, since we could capture all relevant information from the specification itself (in this sense, the DOM TS framework can be seen both as a framework for testing DOM implementations but also for testing the specifications themselves) 3. In order to allow for human interaction as far as requirements expressed in prose are conerned, we allowed for a simple set of tags that represent semantic requirements, ie. what this particular test should test, who wrote it, what part of the specification it points to and so forth, thus also simplifying the report part of the framework. As can clearly be seen, the DOM TS framework is on the one hand very particular to the DOM, on the other, and this is something I want to stress, iits mechanisms as generic enough to cover a large part of the points you mention below (and which I comment on there). Pushing my DOM hat harder into place, then, my comments are inlined: On Tuesday, October 16, 2001, at 03:55 AM, Sean B. Palmer wrote: > [Apologies for the cross post, please trim as appropriate.] > > Recent discussion in the ERT WG and QA activity has focussed upon the > formation of a Test Point Definition Language (TPDL), its position in > the > process flow of evaluation and repair (rationale), and how best if at > all > to proceed with it, if at all. This note is an attempt to summarize > recent > discussion, to point out how crucial TPDL is to the future of work on > EARL, > and to get some kind of rough consensus on how to move forward quickly. > > As many of you will know, the ERT Working Group members (under the > aegis of > the WAI, and perhaps one day QA) have been conducting work upon the > Evaluation And Repair Language (EARL [1]). This is a generic framework > for > expressing evaluation information - with a lean towards rating Web > content, > tools, and user agents - w.r.t technical guidelines and other criteria. > As > such, it does not constitute a test point definition language of itself, > but it does have to link to test points (we call them "test cases" in > EARL) > in order to make the evaluation. > > For example, Chris Ridpath of the University of Toronto has been > working on > a tool called "ATR" [2], which enables a human operator to rank an > accessibility tool against how well it handles a series of > accessibility-related test cases. We need to specifically point to the > test > cases (all of which have URI-views) in order to rank whether the tool > passes or fails them. Note how the test case details are arranged: > rather > than linking directly to the test case, we use an ID for the test case, > and > then point from the ID to the test case file itself. This is because > EARL > allows one to not only link to test cases, but to also come up with a > wide > range of criteria, which may have a range of data associated with them: > purpose, expected result, label, whatever. > > In EARL, we separated out single point criteria (e.g. a WCAG checkpoint) > from single point test cases (e.g. a DOM test case) because they did not > appear to be the same thing: passing a guideline is different from > passing > a test case in that a test case is one step on from a checkpoint. The > test > case may for example be a sample piece of technology that may or may not > invoke the checkpoint, or it may be a detailed model of a mechanism > which > may or may not embody some specification point. Or, perhaps, both. > Separating specification points and test points may be an incorrect way > of > modelling it, but a practical way, for it gets us thinking about the > levels > of metadata attached to each. > [dd] We have indeed focussed on the core technology and not so much on what we could express in single prose, such as "The result of pressing this should be that the picture to my right gets magnified and a floating text is shown above it", even though all that is involved could be DOM. This is a shortcoming, but I'm not sure that it's wrong. From my experience with the DOM TS, it is preferrable to maintaining the distinction between test case and guideline not only in the source, but also when writing the test cases. This for two reasons: 1. It is less error-prone. Writing guidelines in structured form may be boring, but doable and certainly less possible to lead to debate over semantics (introducing exponentially growing levels of metalayers). 2. It is easier to hook up to an existing evaluation framework since the evaluation would be generated by code and not bu human intervention, which in turn places the bigger part of the responsibility on those who write the guideline guidelines, where I think the responsibility belongs. The question that is raised is of course if the distinction can be maintained and the job still done. Am I rithg in assuming that this is where TDPL comes into place? > In the case of the DOM TS, it is apparently a fine line between the two, > but that's irrelevant at the moment. From the draft DOM TS language DTD > [3], it is evident that there is a slight overlap between the test case > class in the EARL schema, and some of the apparent properties in the DTD > (it's difficult to tell the semantics behind a syntactic schemata). For > example, both specify a "purpose", a "description", and an "id". > However, > because the DOM TS is obviously a very specialized thing, it goes on to > enumerate a range of DOM specific predicate relationships: node method, > first child, local name. > [dd] As Curt Arnold pointed out, the reference is severley outdated, but serves very well as an example. As mentioned above, we wanted to introduce as little explicit overhead as possible and, instead, opted for using logic and computation to generate things that are immanent in the tests to reduce both the code and the learning curve of the framework, something which I believe should be counted as a plus (I lean toward doing as much calculation and as little explicit representation as possible). This is perhaps why I didn't push things that hard in bringing the DOM TS and EARL together. > As Al Gilman pointed out [4], the optimum amount of elaboration for a > test > case is when the "view is expanded just enough to capture all > information > necessary to evaluate the test-requirement assertions". Further, this > means > that "some details behind the interface are elided". I do not know > whether > the DOM TS is being overly verbose (I doubt it), and I am not sure > whether > it, or ATR, or EARL would benefit from work on a suitable abstract test > point definition framework based on (as Al put it) "state-trace" > patterns. > Al's message had a third aspect to it, which was to underline the > parallels > between specification and test points, which is not something that I am > necessarily enamoured with, but nevertheless is a noteworthy comment, > and > required reading. > > This is something that we all need to come together and discuss, and > hence > the cross-post. It affects all three groups do a great degree because > work > on EARL will be havily affected by the outcome of any TPDL work, and it > may > well turn out that the DOM TS could heavily benefit by augmenting its > work > to bring it inline with any findings. QA, as the new arbiter of the > W3C's > dog food program, need to ensure that we do what's best: work on a TPDL > language would ideally (IMHO) be carried out there. > [dd] I'd agree that the DOM TS framework, as well as any other such framework, would benefit from being brought in sync with similar efforts. This is why I tried to get the discussion going as far back as March earlier this year. In any case, what we would need to be careful of is not to introduce too much overhead, which would admittedly bring the various frameworks together, by grouping them under the same headings, but they would for themselves be overburdened and too complex. > [Skippable rant about the overall process flow.] > > At the recent ERT WG F2F meeting [5], we discussed the process flow of > evaluations (which we termed as the "end-to-end process"). Here is a > summary of a graph made by Dave Pawson:- > > 1) Requirements capture => 2) Test specification => 3) Test => 4) Test > result => 5) Collate results => 6) Analyze results (query) => 7) Present > results > > [dd] As I wasn't present, please excuse any mistaken understanding of the chart. For my own part, I see the DOM TS and similar frameworks to belong to everything up to 4, possibly 5. 6 obviously belongs to some less machine populated realm and needs to be taken care of accordingly (as indicated above). There is a grey zone between 6 and 7, I think, which deals with anything that was raised in 1, when the original requirements were spelled out, that again need to spelled out and tested against the results obtained. Thus described, any testing framework can do this, as far as calculating is concerned. To generate anything more intelligent than "pass 87%", though, we would need to think a bit more about what it is that is being tested between 6 and 7. My naive understanding is that it is correlation between semantic requirements and results, but they would have to be treated as semantic requirements, and in order to generate meaningful results, codified to begin with, as I also mentioned above. The DOM TS admittedly doesn't do a very good job at this. It is simplistic and very straightforward. I think the gap could be filled by a language specialized at expressing those very requirements in a way that allows for various technologies to use it. We tried (and possibly failed) in the DOM TS, but I think it's the right way to go, in so far as pointing to the specification goes, using path and query expressions to calculate results and so forth. > Unfortunately, for it makes the model more difficult to grasp, there > are a > number of other threads running along with this, that of the tools > designers, and of the instance data creators, etc. > > Anyway, we kinda decided that EARL fits into step 4 of the above, and > that > the TPDL will fit into step 2. Clearly, the test results must contain > (perhaps as a link by reference) the test specification material. It > would > make much more sense to me if this proces flow were as automated (and > hence > machine readable) as possible, because otherwise the chain breaks down. > That is not to say that human intervention does not play its part, > because > the human is often the one running the test in the first place. > > As an example of how this would save data, and increase > interoperability, > at the moment in the ATR output, we have the test case as an ID, and > then > link to the test case data. If the test case data were itself > standardized, > we could simply point to that. Unfortunately, if this is so, the work on > EARL will depend greatly on the outcome of TPDL work, and so it is > imperative that this be sorted out as soon as possible. > [dd] Except for the fact that I fully agree, I'd like to see some discussion on formalizing 1 in the chart, following the reasoning above. What about a modular requirements language which could be used by any W3C technology? Obviously it would only be relevant to some, but could it be an idea to bring this to the table with the various WG chairs? Best, /Dimitris > Cheers, > > [1] http://www.w3.org/2001/03/earl/ > [2] http://www.aprompt.ca/ATR/ > [3] > http://lists.w3.org/Archives/Public/www-dom- > ts/2001May/att-0067/01-methods. > dtd > [4] http://lists.w3.org/Archives/Public/www-qa/2001Oct/0036 > [5] http://www.w3.org/WAI/ER/2001/10/f2f-notes > > -- > Kindest Regards, > Sean B. Palmer > @prefix : <http://webns.net/roughterms/> . > :Sean :hasHomepage <http://purl.org/net/sbp/> . >
Attachments
- text/enriched attachment: stored
Received on Wednesday, 17 October 2001 12:56:52 UTC