normative tests and "incomplete" tests from Jeremy Carroll on 2004-01-19 (www-qa@w3.org from January 2004)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Mon, 19 Jan 2004 11:01:04 +0100
To: www-qa@w3.org
Message-Id: <200401191101.04223.jjc@hpl.hp.com>
This is possible discussion on the minutes of 5th January
http://lists.w3.org/Archives/Public/www-qa-wg/2004Jan/0042
[[
LH  In terms of how QAWG uses the term "normative", in what sense is a test 
case itself "normative"?  Normative prescribes required behavior.
SH  "Undecided" comes in to play.  "Normativity" demonstrates what's 
entailed or not.
MS  Does "normativity" add additional requirements to what's in the spec?
SM  In theory, just illustrates the requirements, doesn't add to it.
MS  So what does it add to call tests "normative"?
]]

I felt a good example of what I understand by a test being normative is in 
this msg:
http://lists.w3.org/Archives/Public/public-webont-comments/2004Jan/0005
particularly
[[
specifically the words "any fragment part is ignored."

These words are further clarified in the test xmlbase/test013 in the RDF 
Test Cases [3].
]]

The five words from the specification are, in my opinion, rather weak. Having 
reached PR without these words having been clarified, we can turn to the 
relevant test (which is linked to from the spec just under these five words), 
and understand those five words rather better.

By having the test as normative there are two advantages:
- if there are two readings of the text "any fragment part is ignored." then 
the one that agrees with the test is normative and the other one is wrong.
- the test as normative reflects the test driven spec development used by RDF 
Core (IIRC, the RDF Core Tests are not subsidiary to the other documents, so 
any conflict between the tests and the other documents has no preferred 
resolution).

I think the WebOnt wording of normative tests subsidiary to the other 
documents reflects that the WebOnt test process was more of a compromise 
between different philosophies of test.

Summary normative tests are illustrative examples that the WG has put enough 
into to bless as part of the specification rather than as additional 
informative information. 

===
[[
MS  What is "extra credit"?
SH  These tests were not expected to pass.
MS  Are these requirements?  Are they "MUSTS"?
SH  What should pass depends on the type of system being built.
LH  Extra credit tests seem to be normative.  What does extra credit tests 
mean with respect to conformance?
]]

One aspect of both the RDF Core tests and the WebOnt tests which is only 
touched upon and perhaps could have been clearer is that they serve three 
purposes:
a) they reflect mathematical consequences of the specifications.
i.e. to help illustrate the meaning of the other documents, and to help 
resolve any problems where there are two possible readings, an intended 
reading and an unintended reading, the tests show relatively simple 
relationships that are formal consequences of a formal system (the RDF or OWL 
recommendations)
b) they are useful to implementors to check that their systems work
c) they are useful to reviewers to check that the specs are implementable

Both RDF Core and WebOnt expect that a significant percentage of their 
implementors will use a particular implementation technique known as 
"inference rules". In addition WebOnt expects other implementors to use a 
"tableau reasoner".

A limitation of the "inference rules" approach is that, as normally used, they 
cannot prove "negative entailments" (although they can disprove such 
entailments). A limitation of "tableau reasoners" is that they cannot be used 
for OWL Full.

Thus there is a class of OWL test cases: OWL Full negative entailments, which 
reflect mathematical consequences of OWL, but WebOnt WG has no particular 
expectation that these consequences will be accessible by machine reasoning.
These form most of the extra credit tests, which many systems report 
"incomplete" for - indicating that they did not fail them (nor did they 
pass). Moreover, this is all the WG was expecting, and the documents are 
intended to reflect this expectation (The extent to which this reflection was 
adequate was vigourously contested by HP: the WG made changes, both 
substantive and editorial, and HP did not formally object. HP's focus was 
actually on some other extra credit tests).

I think I could have presented these distinctions better in the OWL Test Cases 
if I had read and understood the QAWG's work on dimensions of variability 
better (and adequately early).

To try and illustrate this idea with an html example (possibly incorrect) if 
we consider the fragment

<em>a b <span class="foo">c d</span> e f </em>
<span class="foo">c d </span>

the combination of a stylesheet, CSS and HTML specs may indicate that the two 
occurecnes of "c d" are rendered identically. However, the spec may wish to 
state, and a test case could be divised, that "c d" is (conceptually) 
emphasised the first time, and not the second time. This test case could be 
stated even in the absence of any expectation of the existence of software 
that could utilize such a distinction. (e.g. a search engine that would 
return only emphasised occurrences of the search term)


Jeremy 
(personal comment like all my input to the QA work; unlike the vigourous 
contest referred to, where I was often in a somewhat awkward position, 
representing an opinion that often seemed more extreme than my own)

PS
Hmm, a bit more, speaking perhaps as HP rep on WebOnt - HP was well aware of 
the distinction between (c) and (a) and that some of the tests were more 
suited to (a) than (c). We did make a considered judgment during the CR phase 
that the quantity of tests being passed was sufficient that the test suite as 
a whole, at least for us, adequately served goal (c). While we abstained on 
the advance to PR vote, the concerns we had were addressed before the actual 
PR documents were published. I think the OWL test results page, through the 
distinction between "Pass" and "Incomplete" does help the CR and PR reviewers 
to understand this distinction and to understand the limitations on the 
results. I guess the point of this PS is: yes, the distinction complicates 
matters, but a motivated reviewer can understand them and make an informed 
judgement as to whether to support or oppose the advance to recommendation.
Received on Monday, 19 January 2004 05:01:59 UTC