Re: test suite distinctions [was: Re: Feedback on "The Matrix"] from Al Gilman on 2002-03-01 (www-qa@w3.org from February 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Thu, 28 Feb 2002 22:09:46 -0500
To: Lofton Henderson <lofton@rockynet.com>
Cc: www-qa@w3.org
Message-Id: <200203010309.WAA429044@smtp2.mail.iamworld.net>
At 07:26 PM 2002-02-28 , Lofton Henderson wrote:
>>2.  An analogy with accessibility enforcement.
>>
>>The coarsely-quantized rating system of three conformance plateaux for web 
>>accessibility as promulgated in WCAG 1.0 has very little "consensus 
>>stability margin" behind it.
>
>I don't follow the point here.  Could you please elaborate?

I could, and I had.  In the next paragraph.  Read on.

>>The topic of conformance representations engenders a lot of ongoing 
>>controversy in the accessibility domain.

>"Representations" means "claims"?

I guess I would say 'no.'  "Claims" as contemplated under WCAG are indeed "representations" but not vice versa.  Representations includes warrantees, certifications, etc.  It includes the nutrition labeling on merchant foodstuffs.  The generic function, not the particular construction in the WCAG.  Representations are the public-view filtered form of what was learned in the course of the evaluation process.

The issue is the functional form of the filter, the view-specification of the public view.

It is possible to boil things down too little, and it is possible to boil them down too much.  My hunch at the moment is that WCAG-style rollup is a) too all-or-nothing and b) too coarse-grain to be the highest and best rollup plan for ratings of quality program assets such as test sets.

The WAI has valid social reasons not to break the ratings down by functional aspect.  That would tend to pit one disability group against another.  Software quality doesn't operate under this particular cloud.  So we have a freer hand, a tabula_rasa, a fresh sheet of paper.

>It is the plan of the QAWG to produce several guidelines/checkpoints 
>documents in the Framework document family.  These will cover the areas of:
>
>** QA process and operational setup (first public WD, 1-feb)
>** Specifications (recommendations)
>** Technical Materials
>
>Associated WAI-like checklists will allow scoring of processes, specs, 
>materials according to checklists, and rating of the target with WAI-like 
>conformance levels (A, AA, AAA).  I'm not sure that I'm understanding the 
>suggestions in this thread.  Is it suggested that W3C should:
>
>1.) not produce such goodness-rating specs/tools?

The answer to this is yes or no, depending entirely on how broadly or narrowly one reads 'like' in "WAI-like checklists and conformance levels," and 'such' in "such goodness-rating [instruments]."

I won't claim to represent a consensus of the thread, but I do sense some sympathy with Alex's reservations.

I also feel considerable sympathy for Rob's assertion that "in the end, you need to be able to make some summary statements."

My suggestion would be that it is easier to gain consensus that measures are objective and relevant on a dis-aggregated basis.  The more you roll up or aggregate the results, the further you get from the bedrock of clearly objective results.  See if we can get consensus on rating methods for the sub-factors such as a) ease of use b) breadth - completeness with regard to isolated points c) depth - coverage of feature combinations, lifelike cases, and stress tests.


Checkpoints are more portable than are checkpoint weights or priorities.  The same evaluation primitives may be better used in combination with application-specific rollups.  Not locked to priorities as in the WCAG.

The particular rollup method in the WCAG is a weak precedent; other rollup concepts should be considered for the QA application along with this one.

What aproach is anticipated to be taken to collect field experience and roll that into the knowledge base?

Measures of effectiveness for quality tools form an interesting metrology research area, to my knowledge.  What we will get in the short term are prognostic-wannabes of said effectiveness, and not direct measures of actual effectiveness.  If these input scores look good, the output scores should be good.  But we're guessing.  Better to define the several scales and let composite 'goodness' wait on some experience.

Al
>2.) produce them but don't, ourselves (W3C), apply them and publish results?
>3.) something else?
>
>See also embedded questions below...
>
>At 01:44 PM 2/27/2002 -0500, Al Gilman wrote:
>>At 10:13 AM 2002-02-27 , Alex Rousskov wrote:
>> >Overall, the current solution may be sufficient. It is definitely the
>> >simplest and least controversial one.
>> >
>>
>>Two pieces of evidence in support of Alex's approach:
>>
>>1.  The analogy with UDDI+WSDL.
>>
>>The directory really just tells you that a service exists; everything else 
>>is addressed in the service prospectus in a rich language.
>>
>>2.  An analogy with accessibility enforcement.
>>
>>The coarsely-quantized rating system of three conformance plateaux for web 
>>accessibility as promulgated in WCAG 1.0 has very little "consensus 
>>stability margin" behind it.
>
>I don't follow the point here.  Could you please elaborate?
>
>>The topic of conformance representations engenders a lot of ongoing 
>>controversy in the accessibility domain.
>
>"Representations" means "claims"?
>
>Regards,
>-Lofton.
>
>
>>So best not to employ distinguished icons that can be interpreted as 
>>connoting degrees of authority without prior careful review of how 
>>different people will interpret, apply, and populate them.
>>
>>The distinctions suggested fall in "potentially invidious" territory, as I 
>>see it.
>>
>>Al
>>
>> >On Wed, 27 Feb 2002, Tantek Celik wrote:
>> >
>> >> The "Test Suites" column is currently just a boolean
>> >> (hyperlinked!)  indicator of whether or not there is anything even
>> >> remotely resembling a test suite available for a particular
>> >> technology.
>> >>
>> >> While this is useful, it would help significantly if the test
>> >> suites which were actually hosted at w3.org used a "W3C" icon
>> >> instead of the "hammer and wrench" icon.
>> >
>> >IMHO, "being hosted at w3.org" adds little information about the
>> >quality or even availability of the test suite. Reflecting the state
>> >of the suite (under construction, available, production quality, with
>> >public results database, etc.) may be a good idea. In some cases,
>> >however, assigning a state may be a controversial action. Rating the
>> >quality of a suite would be even more controversial, of course.

>> >
>> >> This will help quickly call out at a glance which specs actually
>> >> have official W3C test suites, vs. which have some sort of test
>> >> suite or plan for a test suite, and which have no form of test
>> >> suite at all.
>> >
>> >"Official" has little utility in this context, IMO. Whether I can use
>> >the test suite now is far more important (to me, anyway).
>> >
>> >If W3C branding is important, perhaps there should be two columns:
>> >"W3C endorsement" and "state/availability".
>> >
>> >The situation become even more complex when several test suites are
>> >available and described on a separate page. In that case, one could
>> >use the state of the "best" (e.g., already available) suite to assign
>> >an icon since people are more likely to use the best tool if given a
>> >choice.
>> >
>> >Overall, the current solution may be sufficient. It is definitely the
>> >simplest and least controversial one.
>> >
>> >Alex.
>> >
>
Received on Thursday, 28 February 2002 22:09:54 UTC