Testing REDUCED

Lee Feigenbaum wrote:

> 2. REDUCED tests
> 
> As part of our PR transition call, we agreed to create REDUCED tests for 
> the test suite, publish them, and solicit implementation results during 
> PR. I imagine that one syntax test and one or two evaluation tests will 
> suffice. The syntax test is straightforward, but the evaluation tests 
> require one of three possiblities that I see:
> 
>   + flag the test in the manifest as qt:laxCardinality (or other term). 
> This indicates that an implementation passes the test if it has the 
> right results, but the cardinality of each result can be between 1 and 
> the cardinality in the given result set (inclusive)
>   + associate the test in the manifest with multiple result files. This 
> indicates that an implementation passes the test if its results match 
> any of the given result files
>   + create multiple tests. Expect that an implementation passes exactly 
> one of them.
> 
> The third possibility is nice because implementors do not need to change 
> their test harness at all. (We just need logic in the goo that generates 
> the implementation report to look out for these tests). It's not nice 
> because it means that no one will ever get 100% on the approved tests 
> (nor would we expect them to).
> 
> The first and second solutions require implementors to adapt their test 
> harnesses a bit. For the first one, a test harness needs to detect the 
> qt:laxCardinality predicate and adapt their result-checking procedure 
> accordingly. For the second one, the result-checking procedure stays the 
> same, but the implementation needs to know to check against multiple 
> result files and pass the test if they match any of those result files.
> 
> Thoughts?

The second and third do not work very well unless the test is heavily
constrained (to having only zero, one or two occurrences in the underlying
data, say).  A quite reasonable design of REDUCED is a fixed size window of
previous results and eliminate duplicates within the window.  Just because
currently the common choice is a window of one, we should not assume that
the window is always one.  Any small integer for the window size is reasonable 
and some engines might naturally generated some kind of interlaced duplicates; 
choosing the window to be the periodicity would then be a good choice.

At least one impl just makes REDUCED==DISTINCT.

I also think having a test suite that you can't pass is to be avoided where
possible.  OK - we have such a thing now but for every point it happens the
cautious implementer is going to have to manually check that one possibility
works for them.  Let's minimise that.

Opt 1 (laxCardinality) does require that the test implementation knows about 
it but that is one change that works now for every run and is better IMHO than 
having to check tests for "correct failures" (and who would check every run?:-)

 Andy

Received on Tuesday, 6 November 2007 09:51:18 UTC