Re: Test case review from Linss, Peter on 2011-05-11 (public-test-infra@w3.org from April to June 2011)

From: Linss, Peter <peter.linss@hp.com>
Date: Wed, 11 May 2011 09:02:35 -0700
To: James Graham <jgraham@opera.com>
Cc: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-Id: <69E7E92A-38FD-45BB-B7B4-6C6510107871@hp.com>
On May 11, 2011, at 12:29 AM, James Graham wrote:

> On Tue, 10 May 2011, Linss, Peter wrote:
> 
>> 
>> On May 10, 2011, at 2:15 PM, James Graham wrote:
>> 
>>> No, I undersood. I still don't understand why we care. As far as I know
>>> the only use for saving old test results is regression tracking. Why do we
>>> need to delete the results of specific tests rather than discard the whole
>>> results set?
>> 
>> Ok, the use case is for storing historical snapshots (as well a a 
>> general philosophy of not trowing away data).
>> 
>> Let me give a concrete example. The CSS wg developed the CSS 2.1 test 
>> suite to the point that we felt it was good enough to transition to PR 
>> (this is our RC6 version of the test suite). Since then, we've found 
>> issues with tests and we've found areas where testing coverage of the 
>> spec could be improved. There are also known issues with CSS 2.1 that 
>> were deferred to errata and there may be future testing changes to help 
>> clarify issues.
> 
> [...]
> 
> So if you want to keep all data then you don't need to automatically 
> discard results for tests that have changed. So I don't really follow your 
> argument here.

We don't discard results for tests that have changed, we disregard them in the context of that test suite. Test cases can be used in multiple suites and there may be other test suites that still use the older version.

> 
> An argument that would make sense is "we want to make it easy to do diff 
> runs of the testsuite so that when it is updated we can only run the 
> changed tests rather than all of them". I can see why this would seem good 
> for the CSS 2.1 testsuite because it is a huge undertaking to run the 
> whole thing.

We do that as well, and the harness has a nice algorithm that prioritizes test sequencing based on where results are needed for a particular engine.

> However going forward we should regard testsuites that 
> require significant manual work to run as unacceptable because we should 
> focus on making tests that vendors can run on a day-to-day basis. Once 
> people are running all the tests muliple times per day as a matter of 
> course doing one more run because the tests changed is not hard.

It is our goal to avoid manual tests where possible. That doesn't mean that manual tests won't exist.

> 
>>> There is a big difference if you think in terms of commits rather than in
>>> terms of files. If I merge a series of commits into another branch I can
>>> be sure that I got all the relevant changes and no more. Since, in my
>>> system, a single review request would be an atomic change e.g. the series
>>> of commits needed to add one testsuite, taking all the commits for a
>>> specific review and merging them onto the approved branch would give you a
>>> good assurance that you got the bits that had been reviewed but no more or
>>> less.
>> 
>> The problem still lies in, which series of commits? There may still be 
>> other commits to test assets that aren't obviously related to other 
>> commits. We either have a single monolithic collection of tests and 
>> assets (which we already know doesn't work), or we need to manage the 
>> relationship between the components, or we have a system that sometimes 
>> breaks tests.
> 
> So, fundamentally all I am proposing is that we use the version control 
> system in a mode it is explicitly designed to support, with an unstable 
> branch and a stable branch. This is a rather common setup and in my 
> experience it doesn't cause huge problems of the type you describe.
> 
> In general I much prefer getting full leverage out of our existing tools 
> rather than spending time designing and implmenting complex bespoke 
> solutions that may or may not work better. At the very least it seems 
> prudent to try the cheap approach first before abandoning it for the 
> expensive one.

A couple of points here, 1) our test management system was designed around our existing workflow, and our workflow, to some extent has been based around svn. Now that we're considering switching to hg I'm already re-thinking how aspects of the workflow might work differently in the hg world. So yes, from a philosophical point of view, I also prefer leveraging the available tools to the extent that it makes sense. 

One of the issues that I'm dealing with here, however, is that many of our test contributors are simply not engineers, they don't use vcs (let alone dvcs) on a daily basis, and don't fully grok how those systems work. Forcing them to learn these systems is an unacceptable barrier to entry. I'd rather deal with tool issues on my end than lose their oh so valuable contributions. I also don't know how many other potential contributors are out there, and I don't want a complicated set of tools to scare them away before they even try.

Second, we already have the need of tracking changes to dependent files for version detection, so this code is going to exist anyway.
Received on Wednesday, 11 May 2011 16:02:59 UTC