proposal for test suite reorganization

After exchanging ideas with Andy earlier and passing a draft by Steve
and Lee, I'd like to outline a process for reorganization of the test
suite here.

The summary of the process is: "start over" :)

Seriously though, what I'd like to propose is the following: we create a
new directory ('tests/data-reorganized' or something like that) and
start copying existing tests to this new directory. When copying we take
a careful look at relevance of the test to the current state of the
spec, and if necessary seek DAWG approval for the test (so as much as
possible I'd like the reorganized set to _only_ contain approved tests).
We split the test sets in two basic categories: syntax (parsing) tests
and query evaluation tests, and further divide sets along issues as is
currently the case (tests for value testing, tests for optional, etc.).

I would also like to introduce a directory naming convention for test
sets: all lowercase, using hypenation to separate words/numbers. So
"SyntaxFull" would become 'syntax-full' (assuming we actually keep that
particular name). I don't have my heart set on this particular format by
the way, but almost any convention will do as long as we all use it
consistently.

Additionally I would like to introduce two 'super-manifests', that is,
manifests that contain references to other manifests. One of these
super-manifests will contain references to all approved test sets for
syntax parsing, the other to all approved test sets for query
evaluation. This way, query engine developers have a clear, single point
of entry (or actually, two points of entry) for using the test suite,
without having to sort through all directories to figure out which tests
are relevant and approved.

To facilite this we also need to extend the test manifest vocabulary
slightly.

First of all, we introduce a property 'dawgt:imports' the value of which
is a Collection of manifest references (this is to be used in the
'super-manifests'). We use a collection here to be able to preserve
order in the execution of manifests (which implementations are of course
free to ignore, but which is useful to be able to specify).

Also, we need to more clearly mark the type of a particular test case.
My first idea was to introduce separate classes for syntax test and
evaluation tests but noticed that the current vocabulary schema already
contains classes 'PositiveSyntaxTest' and 'NegativeSyntaxTest'. These
should be consistently used in the actual suite though, which is
currently not the case I think.

I also have a question at this point: do we only consider positive and
negative testing for syntax, or is it conceivable that we want to record
positive/negative evaluation tests as well? If so, I'd propose to
slightly modify the vocabulary at this point and introduce two
orthogonal typing sets for test cases: "SyntaxTest/EvaluationTest" and
"PositiveTest/NegativeTest".

Although conceivably we could do with less explicit vocabulary I'd like
to be as explicit as possible, to make sure there is no need to do
reasoning for proper processing of the manifests. Let's make the
threshold for developers as low as possible.

The advantages of this approach are that we are free to reorganize in
the best possible way without burdening developers with having to adapt
their test suite readers all the time: they can simply continue to use
the 'old' suite until the reorganized suite is sufficiently stable and
then make the switch in one go. It may cause some pain but at least
it'll be only once ;)

Regarding who gets to do all this: I'd be happy to start work on this
and set up the basic structure but the actual moving/copying is not a
one-man job I think.

Your feedback is most welcome.

Cheers,

Jeen
-- 
Dr. Jeen Broekstra                                          Den Dolech 2
Information Systems Group                                        HG 7.76
Department of Mathematics and Computer Science              P.O. Box 513
Technische Universiteit Eindhoven                      5600 MB Eindhoven
tel. +31 (0)40 247 36 86                                 The Netherlands

Received on Thursday, 14 December 2006 10:03:25 UTC