Re: Comments on GEO Test from Al Gilman on 2004-09-07 (www-qa@w3.org from September 2004)

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Tue, 7 Sep 2004 10:56:15 -0400
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, public-i18n-geo@w3.org
Cc: www-qa@w3.org
Message-Id: <p06110400bd636ccd3469@[10.0.1.2]>
At 8:00 PM +0100 9/6/04, Jeremy Carroll wrote:
>Hi
>
>I've been browsing the GEO test work (on-going) at
>
>http://www.w3.org/International/tests/
>
>It's good to see so much work being done in this area.
>In particular, it seems that you are being very thorough in 
>identifying language related features tha should be tested, and some 
>idea of what tests can be made.
>
>I had some very general comments trying to combine what I've 
>understood from the QA WG's work with these pages.
>
>The principle problem I found was it was too unclear who was meant 
>to do what with these tests.

Tests themselves are best multi-use and should not internally be 
bound to a 'who.'  Test plans will
identify who should engage in what test activities where in the 
[economy, the digital ecology, the
infosphere].  The test plan will then detail the tests used and the 
rollups to be performed on the
results of the individual tests.  The latter is where the normative 
criteria enter into the picture.

Tests are to be used the same by producers and consumers so that they 
have an objective basis
for discussion of what passes between them.

>In the QA WG's framework, testing is about testing some 'class of 
>product' against some 'conformance statement'.

If that is true, it is an important bug in the QA theory.

The required parts of a test are:
  - what to do
  -- condidtions to control
  -- actions to take
- what to observe and record

In other words, 'test' is a method for a measurement transaction.

Normative critieria are outside the essential definition of the test.
Some test descriptions incorporate data reduction and pass/fail
criteria. But that is optional, not essential.

In order for our framework to support the separation of policy and
technology, the model I offered here must be supported.

The point of a test is to establish a matter of fact. For the W3C
mission of establishing global agreements, this should be done in as
objective terms as is possible. Whether the observed response is what
should have happened is a separate conversation.

Take the newest test about links to alternates.

http://www.w3.org/International/tests/sec-link

Here the objective is not merely to assess conformance to some
specification profile but to see what browsers actually do with
markup. The objective is to observe a 3D relation among:

- browsers
- markup
- behavior, capabilities afforded at the UI

It would be possible to be picky and say "that's an experiment, not a
test." That's trying to force too narrow and legalistic a concept on
the actual class that most of our tests are: stimulus samples that if
used in multiple stimulus-and-observation episodes will make it
easier to compare and interpret the results that vary from episode to
episode. The point is to clarify the dependency on UI behavior on the
diversity among browsers by introducing the same structured sample of
markup instances so for each sub-experiment there is the same markup
exposed to all the user agents. Then there are a variety of sub-tests
or test points to expose different markup configurations and not just
a single point.

The reason why it is an unfortunate narrowness is well exemplified by
this example.

If you are going to take the trouble to put up a page exploring UA
implementation of markup features, you might as well explore for
actual implementation and not just for conformance. That means you
have two customers for your work product, not just one. Authoring
interests want to know what works in browsers today, not just what
should work per the published writ.

Even for the EO team who are working to bring the implementations
into conformance with the published writ, one doesn't want to engage
an implementer in a dialog in ignorance of what they *do* do with
regard to some user function such as "finding and/or getting to the
alternate in my language."

The main problem with this page is that it only exercises the <link>
option and fails to compare it with comparable examples of <a> links
to other-language alternates.  It would help to structure the experiment
with parallel entities so as to make the comparison easier.

Make that comparison and it should be obvious what authors should do.

Yes, this test could do better at quantifying the outcomes to record.

But no, we should not limit "tests" to experiments with binary outcomes
aligned with normative provisions in some reference.  Because the world
around us is going to call _repeatable stimulus examples_ 'tests' and we
should get use to it.

Al

PS: The blood *tests* that my doctor gave me at my recent physical
examination were to measure my LDL, HDL, and total cholesterol
*levels.* Not to assess conformance to some binary assertion.


>Since these tests seem to be about language in HTML, XHTML and CSS, 
>I suggest that:
>- the 'class of product's are two:
>    + user agent - typically a traditional web browser
>    + a web page (HTML, XHTML) or CSS stylesheet, or possibly a web 
>site, when multiple pages are being considered
>
>- the 'conformance statement' is the (implicit?) conformance 
>statement of being a user agent or being an HTML/XHTML/CSS page, but 
>then only really looking at the specifics of language related 
>conformance.
>
>To make the tests easier to use, it is important to identify who 
>might use them, and for what.
>
>I see four classes of user:
>
>- a user agent developer, trying to ensure that their work does 
>handle language features correctly
>
>- a user agent consumer trying to choose a user agent that best 
>handles the language features that are important to them
>
>- a web page author (whether human or automated) trying to ensure 
>that their work correctly marks up language related aspects of the 
>content
>
>- a web page consumer trying to understand whether their language 
>related difficulty with some site is because the site is broken or 
>their user agent is broken
>
>I envisage testing being done with some sort of test harness that 
>co-ordinates running the tests, and generating reports.
>
>I imagine these test users operating in two modes:
>
>A) testing a user agent on a large number of simple tests, which 
>should each pass or fail. Given the graphical nature of a 
>traditional user agent, it is implausible to fully automate this 
>procedure, which will require manual assistance for almost all the 
>tests.
>
>B) testing a web page, for a number of simple features, each of 
>which passes or fails. In the main such tests can be automated, such 
>as in the HTML validator or the pubrules checker.
>
>http://www.w3.org/2001/07/pubrules-form (member only link, I think)
>
>
>
>A) Testing a User Agent
>
>
>The person conducting test needs to give the test harness manual 
>feedback as to whether each test passes or fails. e.g. a test for 
>the <link> element with content in alternative language will pass if 
>the user agent provides some indication that alternative content is 
>available, and fail if not.
>
>See
>http://www.w3.org/International/tests/sec-link
>
>But that page has a lot of information that confuses this simple test.
>Including some of the HTML source as verbatim within the content is 
>unnecessary and confusing (there is always 'view source' for the 
>geeks)
>
>The questions at the end of the page, which are in fact the things 
>to be tested, do not indicate the 'right' answers, e.g. "Does the 
>user agent provide information about all the links in the markup?" 
>without looking at the markup, how do I know? And why should I look 
>at the markup?
>
>The question should be something like:
>
>Is a link to a page in FR displayed?
>Is a link to a page in ZH displayed?
>...
>
>Ideally, web forms could be used to collect answers to these tests, 
>in order to generate a report. This report could be generated either 
>on the server or on the client. Thus the questions would appear
>
>Is a link to a page in FR displayed? [YES] [NO]
>
>Essentially running these tests will be boring, and the test 
>framework should make this boring task as quick and easy as 
>possible. No clutter, support for collecting the results, simple 
>questions that do not require engagement of brain.
>
>Many groups have RDF based formats for collecting test results, and 
>there are a number of XSLT style sheets etc. that then format these 
>results in an attractive way.
>
>B) Testing a web page
>
>The pubrules checker is very helpful for checking that W3C tech 
>reports follow the W3C pubrules - each of the rules has some XSLT 
>code to check it, and the result is then displayed on the summary 
>page in green, red, or yellow - where the yellow is an undetermined 
>result requiring human thought to decide.
>
>I realise that the GEO TF does not have a lot of effort to dedicate 
>to the, essentially programming, tasks envisaged here. However, 
>clarity that the ideal long term goal is something like what I've 
>sketched (or some other overall test framework agreed by the GEO 
>TF), would, I think, allow for tests that were more focussed on 
>testing, and were easier to understand.
>
>
>Hope this is helpful
>
>Some members of the QA IG may be able to point to appropriate 
>overview material on planning for testing, it is not clear to me 
>which would be the most helpful entry point into their work (some of 
>which is listed below)
>
>Jeremy


>
>
>>* The QA Handbook *
>>
>>     http://www.w3.org/TR/2004/WD-qa-handbook-20040830/
>>     A short guide to help a WG to organize its life. Chairs and 
>>Staff Contacts are most likely to be interested. Though anyone can 
>>read it as well.
>>
>>
>>*The QA Framework: Specification Guidelines*
>>
>>     http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/
>>     A guide to help you to go through all the problems you might 
>>encounter creating a technology and writing a specification. Some 
>>of these guidelines will be obvious for you, but a few others might 
>>raise new isuues you had no opportunity to think about. A checklist 
>>(ICS) has been given with this document to control if you have 
>>forgotten or not to explore a topic.
>>     http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/specgl-ics
>>
>>
>>*Variability in Specification.*
>>
>>     http://www.w3.org/TR/2004/WD-spec-variability-20040830/
>>     When designing a technology, you might face very difficult 
>>topics, with very deep ties to conformance and interoperability. If 
>>you need to explore further advanced topics, we recommend you the 
>>readin of variability in specification.
>>
>>
>>* The QA Framework: Test Guidelines *
>>
>>     http://www.w3.org/TR/2004/WD-qaframe-test-20040820/
>>     This document is put on a hold, due to the lack of resources of 
>>the QA WG. If more manpower was joining the WG, we might have the 
>>possibility to finish it.
Received on Tuesday, 7 September 2004 14:56:50 UTC