Re: Test Cases, TPDL, and Formalisms in EARL

Sean, thanks for your posting. It raises a series of issues that are 
indeed relevant to all three groups.

To begin with, sorry if my reply seems a bit misinformed; I've been away 
for a while and I have a tough time bringing things back to normal 
status.

Generally speaking, I am concerned that things tend to get a bit mixed 
up. When we originally started thinking about what a test case is, and 
how it should be represented, we had these goals in mind [and here I 
refrain from including pointers to relevant postings, if anyone wants 
the original source, please email me]

1. The description language should not bias particular language bindings 
(especially relevant to the DOM, as it is language-neutral). We settled 
for expressing each test case in XML, using mechanisms explained below
2. The description language should, if possible, be generated and not 
authored (this perhaps only pushes the error source a bit ruther back 
and puts specification authors under severe stress). We settled for 
generating the DOM TS (Test Suite) ML directly from the DOM 
specification, since we could capture all relevant information from the 
specification itself (in this sense, the DOM TS framework can be seen 
both as a framework for testing DOM implementations but also for testing 
the specifications themselves)
3. In order to allow for human interaction as far as requirements 
expressed in prose are conerned, we allowed for a simple set of tags 
that represent semantic requirements, ie. what this particular test 
should test, who wrote it, what part of the specification it points to 
and so forth, thus also simplifying the report part of the framework.

As can clearly be seen, the DOM TS framework is on the one hand very 
particular to the DOM, on the other, and this is something I want to 
stress, iits mechanisms as generic enough to cover  a large part of the 
points you mention below (and which I comment on there).

Pushing my DOM hat harder into place, then, my comments are inlined:

On Tuesday, October 16, 2001, at 03:55  AM, Sean B. Palmer wrote:

> [Apologies for the cross post, please trim as appropriate.]
>
> Recent discussion in the ERT WG and QA activity has focussed upon the
> formation of a Test Point Definition Language (TPDL), its position in 
> the
> process flow of evaluation and repair (rationale), and how best if at 
> all
> to proceed with it, if at all. This note is an attempt to summarize 
> recent
> discussion, to point out how crucial TPDL is to the future of work on 
> EARL,
> and to get some kind of rough consensus on how to move forward quickly.
>
> As many of you will know, the ERT Working Group members (under the 
> aegis of
> the WAI, and perhaps one day QA) have been conducting work upon the
> Evaluation And Repair Language (EARL [1]). This is a generic framework 
> for
> expressing evaluation information - with a lean towards rating Web 
> content,
> tools, and user agents - w.r.t technical guidelines and other criteria. 
> As
> such, it does not constitute a test point definition language of itself,
> but it does have to link to test points (we call them "test cases" in 
> EARL)
> in order to make the evaluation.
>
> For example, Chris Ridpath of the University of Toronto has been 
> working on
> a tool called "ATR" [2], which enables a human operator to rank an
> accessibility tool against how well it handles a series of
> accessibility-related test cases. We need to specifically point to the 
> test
> cases (all of which have URI-views) in order to rank whether the tool
> passes or fails them. Note how the test case details are arranged: 
> rather
> than linking directly to the test case, we use an ID for the test case, 
> and
> then point from the ID to the test case file itself. This is because 
> EARL
> allows one to not only link to test cases, but to also come up with a 
> wide
> range of criteria, which may have a range of data associated with them:
> purpose, expected result, label, whatever.
>
> In EARL, we separated out single point criteria (e.g. a WCAG checkpoint)
> from single point test cases (e.g. a DOM test case) because they did not
> appear to be the same thing: passing a guideline is different from 
> passing
> a test case in that a test case is one step on from a checkpoint. The 
> test
> case may for example be a sample piece of technology that may or may not
> invoke the checkpoint, or it may be a detailed model of a mechanism 
> which
> may or may not embody some specification point. Or, perhaps, both.
> Separating specification points and test points may be an incorrect way 
> of
> modelling it, but a practical way, for it gets us thinking about the 
> levels
> of metadata attached to each.
>
[dd] We have indeed focussed on the core technology and not so much on 
what we could express in single prose, such as "The result of pressing 
this should be that the picture to my right gets magnified and a 
floating text is shown above it", even though all that is involved could 
be DOM.

This is a shortcoming, but I'm not sure that it's wrong. From my 
experience with the DOM TS, it is preferrable to maintaining the 
distinction between test case and guideline not only in the source, but 
also when writing the test cases. This for two reasons:
1. It is less error-prone. Writing guidelines in structured form may be 
boring, but doable and certainly less possible to lead to debate over 
semantics (introducing exponentially growing levels of metalayers).
2. It is easier to hook up to an existing evaluation framework since the 
evaluation would be generated by code and not bu human intervention, 
which in turn places the bigger part of the responsibility on those who 
write the guideline guidelines, where I think the responsibility belongs.

The question that is raised is of course if the distinction can be 
maintained and the job still done. Am I rithg in assuming that this is 
where TDPL comes into place?

> In the case of the DOM TS, it is apparently a fine line between the two,
> but that's irrelevant at the moment. From the draft DOM TS language DTD
> [3], it is evident that there is a slight overlap between the test case
> class in the EARL schema, and some of the apparent properties in the DTD
> (it's difficult to tell the semantics behind a syntactic schemata). For
> example, both specify a "purpose", a "description", and an "id". 
> However,
> because the DOM TS is obviously a very specialized thing, it goes on to
> enumerate a range of DOM specific predicate relationships: node method,
> first child, local name.
>
[dd] As Curt Arnold pointed out, the reference is severley outdated, but 
serves very well as an example. As mentioned above, we wanted to 
introduce as little explicit overhead as possible and, instead, opted 
for using logic and computation to generate things that are immanent in 
the tests to reduce both the code and the learning curve of the 
framework, something which I believe should be counted as a plus (I lean 
toward doing as much calculation and as little explicit representation 
as possible). This is perhaps why I didn't push things that hard in 
bringing the DOM TS and EARL together.

> As Al Gilman pointed out [4], the optimum amount of elaboration for a 
> test
> case is when the "view is expanded just enough to capture all 
> information
> necessary to evaluate the test-requirement assertions". Further, this 
> means
> that "some details behind the interface are elided". I do not know 
> whether
> the DOM TS is being overly verbose (I doubt it), and I am not sure 
> whether
> it, or ATR, or EARL would benefit from work on a suitable abstract test
> point definition framework based on (as Al put it) "state-trace" 
> patterns.
> Al's message had a third aspect to it, which was to underline the 
> parallels
> between specification and test points, which is not something that I am
> necessarily enamoured with, but nevertheless is a noteworthy comment, 
> and
> required reading.
>
> This is something that we all need to come together and discuss, and 
> hence
> the cross-post. It affects all three groups do a great degree because 
> work
> on EARL will be havily affected by the outcome of any TPDL work, and it 
> may
> well turn out that the DOM TS could heavily benefit by augmenting its 
> work
> to bring it inline with any findings. QA, as the new arbiter of the 
> W3C's
> dog food program, need to ensure that we do what's best: work on a TPDL
> language would ideally (IMHO) be carried out there.
>
[dd] I'd agree that the DOM TS framework, as well as any other such 
framework, would benefit from being brought in sync with similar 
efforts. This is why I tried to get the discussion going as far back as 
March earlier this year. In any case, what we would need to be careful 
of is not to introduce too much overhead, which would admittedly bring 
the various frameworks together, by grouping them under the same 
headings, but they would for themselves be overburdened and too complex.

> [Skippable rant about the overall process flow.]
>
> At the recent ERT WG F2F meeting [5], we discussed the process flow of
> evaluations (which we termed as the "end-to-end process"). Here is a
> summary of a graph made by Dave Pawson:-
>
> 1) Requirements capture => 2) Test specification => 3) Test => 4) Test
> result => 5) Collate results => 6) Analyze results (query) => 7) Present
> results
>
>
[dd] As I wasn't present, please excuse any mistaken understanding of 
the chart.

For my own part, I see the DOM TS and similar frameworks to belong to 
everything up to 4, possibly 5. 6 obviously belongs to some less machine 
populated realm and needs to be taken care of accordingly (as indicated 
above). There is a grey zone between 6 and 7, I think, which deals with 
anything that was raised in 1, when the original requirements were 
spelled out, that again need to spelled out and tested against the 
results obtained. Thus described, any testing framework can do this, as 
far as calculating is concerned. To generate anything more intelligent 
than "pass 87%", though, we would need to think a bit more about what it 
is that is being tested between 6 and 7. My naive understanding is that 
it is correlation between semantic requirements and results, but they 
would have to be treated as semantic requirements, and in order to 
generate meaningful results, codified to begin with, as I also mentioned 
above.

The DOM TS admittedly doesn't do a very good job at this. It is 
simplistic and very straightforward. I think the gap could be filled by 
a language specialized at expressing those very requirements in a way 
that allows for various technologies to use it. We tried (and possibly 
failed) in the DOM TS, but I think it's the right way to go, in so far 
as pointing to the specification goes, using path and query expressions 
to calculate results and so forth.


> Unfortunately, for it makes the model more difficult to grasp, there 
> are a
> number of other threads running along with this, that of the tools
> designers, and of the instance data creators, etc.
>
> Anyway, we kinda decided that EARL fits into step 4 of the above, and 
> that
> the TPDL will fit into step 2. Clearly, the test results must contain
> (perhaps as a link by reference) the test specification material. It 
> would
> make much more sense to me if this proces flow were as automated (and 
> hence
> machine readable) as possible, because otherwise the chain breaks down.
> That is not to say that human intervention does not play its part, 
> because
> the human is often the one running the test in the first place.
>
> As an example of how this would save data, and increase 
> interoperability,
> at the moment in the ATR output, we have the test case as an ID, and 
> then
> link to the test case data. If the test case data were itself 
> standardized,
> we could simply point to that. Unfortunately, if this is so, the work on
> EARL will depend greatly on the outcome of TPDL work, and so it is
> imperative that this be sorted out as soon as possible.
>

[dd] Except for the fact that I fully agree, I'd like to see some 
discussion on formalizing 1 in the chart, following the reasoning above. 
What about a modular requirements language which could be used by any 
W3C technology? Obviously it would only be relevant to some, but could 
it be an idea to bring this to the table with the various WG chairs?

Best,

/Dimitris


> Cheers,
>
> [1] http://www.w3.org/2001/03/earl/
> [2] http://www.aprompt.ca/ATR/
> [3]
> http://lists.w3.org/Archives/Public/www-dom-
> ts/2001May/att-0067/01-methods.
> dtd
> [4] http://lists.w3.org/Archives/Public/www-qa/2001Oct/0036
> [5] http://www.w3.org/WAI/ER/2001/10/f2f-notes
>
> --
> Kindest Regards,
> Sean B. Palmer
> @prefix : <http://webns.net/roughterms/> .
> :Sean :hasHomepage <http://purl.org/net/sbp/> .
>

Received on Wednesday, 17 October 2001 12:56:51 UTC