Re: proposal for test suite reorganization from Seaborne, Andy on 2006-12-15 (public-rdf-dawg@w3.org from October to December 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 15 Dec 2006 16:30:13 +0000
To: Jeen Broekstra <j.broekstra@tue.nl>
Cc: dawg mailing list <public-rdf-dawg@w3.org>
Message-ID: <4582CD95.4010004@hp.com>
Jeen Broekstra wrote:
> After exchanging ideas with Andy earlier and passing a draft by Steve
> and Lee, I'd like to outline a process for reorganization of the test
> suite here.
> 
> The summary of the process is: "start over" :)

:-)

> 
> Seriously though, what I'd like to propose is the following: we create a
> new directory ('tests/data-reorganized' or something like that) and
> start copying existing tests to this new directory. When copying we take
> a careful look at relevance of the test to the current state of the
> spec, and if necessary seek DAWG approval for the test (so as much as
> possible I'd like the reorganized set to _only_ contain approved tests).

+1

> We split the test sets in two basic categories: syntax (parsing) tests
> and query evaluation tests, and further divide sets along issues as is
> currently the case (tests for value testing, tests for optional, etc.).
> 
> I would also like to introduce a directory naming convention for test
> sets: all lowercase, using hypenation to separate words/numbers. So
> "SyntaxFull" would become 'syntax-full' (assuming we actually keep that
> particular name). I don't have my heart set on this particular format by
> the way, but almost any convention will do as long as we all use it
> consistently.
> 
> Additionally I would like to introduce two 'super-manifests', that is,
> manifests that contain references to other manifests. One of these
> super-manifests will contain references to all approved test sets for
> syntax parsing, the other to all approved test sets for query
> evaluation. This way, query engine developers have a clear, single point
> of entry (or actually, two points of entry) for using the test suite,
> without having to sort through all directories to figure out which tests
> are relevant and approved.
> 
> To facilite this we also need to extend the test manifest vocabulary
> slightly.
> 
> First of all, we introduce a property 'dawgt:imports' the value of which
> is a Collection of manifest references (this is to be used in the
> 'super-manifests'). We use a collection here to be able to preserve
> order in the execution of manifests (which implementations are of course
> free to ignore, but which is useful to be able to specify).
> 
> Also, we need to more clearly mark the type of a particular test case.
> My first idea was to introduce separate classes for syntax test and
> evaluation tests but noticed that the current vocabulary schema already
> contains classes 'PositiveSyntaxTest' and 'NegativeSyntaxTest'. These
> should be consistently used in the actual suite though, which is
> currently not the case I think.
> 
> I also have a question at this point: do we only consider positive and
> negative testing for syntax, or is it conceivable that we want to record
> positive/negative evaluation tests as well? If so, I'd propose to
> slightly modify the vocabulary at this point and introduce two
> orthogonal typing sets for test cases: "SyntaxTest/EvaluationTest" and
> "PositiveTest/NegativeTest".

I can't think of what a negative evaluation test would be.  There are two 
sub-cases I can think of :

case 1: The query executes and definitely does not give a particular set of 
answers.  Tricky - the actual answers may be those unwanted ones + some junk 
which would mean it didn't give results.  Usually negative evaluatiobn tests 
test that certain things can be found, not test the whole lot for exact 
(non)equality.  Having a portable way to test for this kind of negative test 
looks quiet hard in the general case (subgraph isomorphism?).  Are there 
particular cases you think we should be interested in?

The other case I can think of is a query that executes and fails to produce 
any answers (it crashes, in other words).  Again, a negative tests here isn't 
much of a test as the reason will be implementation dependent.

If those are the only two cases, then I'd say we need not worry about negative 
evaluation tests.

> 
> Although conceivably we could do with less explicit vocabulary I'd like
> to be as explicit as possible, to make sure there is no need to do
> reasoning for proper processing of the manifests. Let's make the
> threshold for developers as low as possible.
> 
> The advantages of this approach are that we are free to reorganize in
> the best possible way without burdening developers with having to adapt
> their test suite readers all the time: they can simply continue to use
> the 'old' suite until the reorganized suite is sufficiently stable and
> then make the switch in one go. It may cause some pain but at least
> it'll be only once ;)
> 
> Regarding who gets to do all this: I'd be happy to start work on this
> and set up the basic structure but the actual moving/copying is not a
> one-man job I think.
> 
> Your feedback is most welcome.

Looks good to me.  I thimk we have most of the basic technolgy in place;
it's the approving of test that needs to be done most.

Do you have any sense of how many tests we will need to approve?

Hopefully, they can be done in batches once we have the basic structures
in place and people can easily run manifests with their systems.

Some one might well ask about using EARL.  I thought that was more to report 
test than record them although Eric saw more possibility to incorporate EARl 
into the testing fraemwork.

	Andy

> 
> Cheers,
> 
> Jeen
Received on Friday, 15 December 2006 16:30:32 UTC