Re: test suite distinctions [was: Re: Feedback on "The Matrix"]

On Fri, 1 Mar 2002 skall@nist.gov wrote:

> We should not be "rating" tools if by "rating" we mean assigning
> degrees of goodness.  As in everything we do, we should be
> determining conformance to our document(s). This is all a
> recommendation should do. This is a yes or no decision.  Thus, the
> checklist is very appropriate.

In the simplest form you advocate, determining conformance is still
rating with only two possible marks: "0" and "1", "conformant" and
"not conformant". A 3-point system is also common: "compliant",
"conditionally compliant", "not compliant". These are all ratings.

My doubt of practical usefulness of ratings comes from experience of
determining compliance with complex protocols such as HTTP. I do not
have in-depth knowledge of all *ML recommendations that W3C is focused
on today, but I suspect that similar problems will apply to many
non-basic Recommendations. I also suspect that W3C Recommendations
will grow in complexity as the areas of interest mature.

What problems am I talking about? Well, it is practically impossible
to say that a given product is compliant with [HTTP] specs. There are
multiple reasons for that, including:
	- ambiguities in the specs
	- conflicts between spec and real-world requirements
	- practically untestable conditions
	- virtually infinite number of testable conditions

Thus, if your checklist includes an item like "test tool MUST cover
all conditions that affect compliance with the Recommendation", it
would be impossible (in practice) to satisfy this criteria for complex
Recommendations. In other words, for any given test suite, I can come
up with an implementation that is either not compliant with the
Recommendation and/or will not work in real world, but still passes
all tests.

> > The things you mention are obvious qualities of a good test suite. Any
> > sane test suite author would try to implement them, and nobody will
> > get them right 100%. 
> 
> Huh? Many should get these 100% right (i.e.., get "yes" for each item on the 
> checklist.)

Not when testing against a non-trivial Recommendation with a
non-trivial test tool.

> We don't rate implementations according to which is better, but
> only provide criteria to determine conformance (yes or no).  Why
> should this be different?

It should not be any different! However, I claim that determining
conformance of an implementation is impossible for many complex
Recommendations.


Moreover, we need to ask ourselves what is gained by testing/rating
for conformance only. Most practitioners I work with do not care much
about pure conformance. Instead, they care about interoperability and
robustness as well as some practical level of conformance. Because
real world is not compliant, just being compliant with the specs does
not buy you much and sometimes even hurts. Unfortunately.

Only marketing departments care about "pure compliance" because they
want to put a "W3C endorsed" or "HTTP compliant" sticker on the
product and in the ad.

QA WG should help authors write better test tools and should help
users find appropriate test tools. IMO, rating (including conformance
statements) based on a meta-level checklist is not going to help much
in either case. W3C should write Recommendations and guidelines. W3C
should develop test suites and provide available test suites. W3C
should not try to rate implementations, including test suites.


To make this discussion more productive, I would suggest that you post
a draft of the meta-level checklist so that it would be clear whether
I am just being paranoid or useful generic rating is indeed
impossible.

Thanks,

Alex.

Received on Friday, 1 March 2002 10:47:14 UTC