Re: Test case review

On May 10, 2011, at 2:15 PM, James Graham wrote:

> No, I undersood. I still don't understand why we care. As far as I know 
> the only use for saving old test results is regression tracking. Why do we 
> need to delete the results of specific tests rather than discard the whole 
> results set?

Ok, the use case is for storing historical snapshots (as well a a general philosophy of not trowing away data).

Let me give a concrete example. The CSS wg developed the CSS 2.1 test suite to the point that we felt it was good enough to transition to PR (this is our RC6 version of the test suite). Since then, we've found issues with tests and we've found areas where testing coverage of the spec could be improved. There are also known issues with CSS 2.1 that were deferred to errata and there may be future testing changes to help clarify issues.

To that end, we still have active development on the CSS2.1 test suite and we consider it a living thing to be developed for as long as CSS 2.1 is relevant. We perform nightly builds of the suite and they're automatically imported into the harness.

If you look at our harness ( http://test.csswg.org/harness/ ), you'll see that it's still showing results for the RC6 version of the CSS 2.1 test suite. This has value, especially for those members of the AC reviewing CSS 2.1 for REC.

There will also likely be other points in the future where we will capture a snapshot of the test suite and it's results (like when we publish an errata).

Yes, there are tests that were found to be wrong and the old results will never be seen again. The cost of keeping them are trivial (less than 50 bytes each or so), frankly, it takes more effort to remove them. At some point if the result db grows unwieldily, I may very well purge ancient useless result data, but at this point, who's to say what we might need down the road?

> There is a big difference if you think in terms of commits rather than in 
> terms of files. If I merge a series of commits into another branch I can 
> be sure that I got all the relevant changes and no more. Since, in my 
> system, a single review request would be an atomic change e.g. the series 
> of commits needed to add one testsuite, taking all the commits for a 
> specific review and merging them onto the approved branch would give you a 
> good assurance that you got the bits that had been reviewed but no more or 
> less.

The problem still lies in, which series of commits? There may still be other commits to test assets that aren't obviously related to other commits. We either have a single monolithic collection of tests and assets (which we already know doesn't work), or we need to manage the relationship between the components, or we have a system that sometimes breaks tests.

Believe me, there are reasons for the things we've been doing with the CSS test suite. We're not arbitrarily making up rules for how we think it should work, we're trying to come up with solutions that prevent repeating past mistakes.

Received on Tuesday, 10 May 2011 21:53:36 UTC