Re: Vendor harnesses from Linss, Peter on 2011-05-11 (public-test-infra@w3.org from April to June 2011)

From: Linss, Peter <peter.linss@hp.com>
Date: Wed, 11 May 2011 08:42:29 -0700
To: James Graham <jgraham@opera.com>
Cc: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-Id: <22D85FCD-59CE-4F0D-8DA5-0D0D87C69BB4@hp.com>

On May 10, 2011, at 11:59 PM, James Graham wrote:

> On Tue, 10 May 2011, Linss, Peter wrote:
>>> Why? I am generally not interested in the W3C collecting results and
>>> haven't really understood why other people are. Test results are only
>>> really useful for QA; you can see where you have bugs that you need to fix
>>> and ensure that you don't regress things that used to work. But that's a
>>> very vendor specific thing; it's not something that W3C has to do. When
>>> people try to use tests for non-QA purposes like stating that one browser
>>> is more awesome than another it leads to bad incentives for people
>>> submitting tests.
>> 
>> It's called transitioning from CR to PR. The working groups need test 
>> result data in order to advance specs. That's the only reason the CSS wg 
>> spent years building a test suite in the first place, without the result 
>> data the tests are useless to the wg. Frankly the only reason I spent so 
>> much of my own time building the harness was to track testing coverage 
>> and to generate the implementation report for CSS 2.1.
> 
> The failure of the CSS2.1 testsuite wasn't that it was so hard to get 
> people to create implementation reports; that was merely a symptom. The 
> failure was that the tests weren't being run on a day-to-day basis by 
> browser vendors long before attempting to transition to PR. That's the 
> problem that we need to solve. Once you have people using the tests for 
> real work, W3C Process stuff falls out as a happy side effect.

Sorry, try taking a look at the CSS 2.1 implementation report. We have result data from 8 different vendors, and the general public, covering 24 different user agents. That's what it took to exit CR in the real world.

Collecting that data, collating that data, being able to tell when you've met exit criteria, and being able to determine what you need to exit CR does not simply "fall out" as a side effect of other work. It's work in and of itself. We already have a tool that does all that quite nicely.

The other aspect you have to consider is that we have many thousands of existing tests that require a human to evaluate them. Yes, we'd like to replace those with automatable tests, but they're not going away any time soon. As you get more experience developing test suites, you're going to find that while having every single test be automatable is a laudable goal, it's not entirely practical in the real world. Writing tests is not a trivial task and you take what you can get.

The bottom line here is that the CSS wg has years of experience in this area, we need to collect result data, doing so was incredibly useful, and you're not going to convince me that we don't need it or that all that work will magically happen by itself at some indeterminate point in the future. Please try to understand that other groups may have needs that you don't see or agree with, that doesn't make them any less real.

If other groups don't have a need to collect result data, they don't have to. We do, and we already have the tool that does it. I don't see any point in debating this further.

Received on Wednesday, 11 May 2011 15:42:52 UTC