Re: HTML5 Test Development Estimation from Tobie Langel on 2013-08-01 (public-html-testsuite@w3.org from August 2013)

From: Tobie Langel <tobie@w3.org>
Date: Thu, 1 Aug 2013 04:08:30 +0200
To: Kevin Kershaw <K.Kershaw@cablelabs.com>, public-test-infra <public-test-infra@w3.org>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>, "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>, Takashi Hayakawa <T.Hayakawa@cablelabs.com>, Brian Otte <B.Otte@cablelabs.com>, Nishant Shah <N.Shah@cablelabs.com>
Message-ID: <8DE6A97084234E5C9AE2D583B5B73994@w3.org>
Cross-posting to test-infra as this should be of interest to subscribers there.

Thanks Kevin, for posting this here. Please see my comments inline.  


On Wednesday, July 31, 2013 at 12:35 AM, Kevin Kershaw wrote:

> My name is Kevin Kershaw and I work as an engineering manager at CableLabs in Denver, CO. If you’re not currently aware of us, CableLabs develops standards, software, and test tools as well as basic doing basic R&D and testing activities on behalf of Cable TV operators – primarily in North America. We have a small group of engineers here who are interested in providing additional test coverage for HTML5 features to the W3C, in particular, we want to develop tests for features around media presentation that are of critical concern for the operator community. Our engineers have spent some time looking into the set of existing HTML5 tests and the associated test harness and we’re at the point now where I’d like to get some advice from the broader HTML5 test community.
Other than the mailing lists you already aware of, you might be interested in looking at public-test-infra@ and hoping up on #testing on irc.w3.org.  
>  
> First off, our team is specifically looking at building tests for designated subsections of HTML5 section 4.8. We originally identified video, audio, track, and media elements in our scope but added the source element and Dimension Attributes because of the tight coupling we see between these. Also, we’ve excluded some Media Element subsections (e.g. MediaControl) for our initial work. We started a “bottom-up” analysis of the target sections, working to identify what looked to us to be “testable” requirements in the spec. The subsections of the spec itself divide up pretty nicely by individual paragraphs. That is, each paragraph usually lists one or more potential test conditions.. We did some basic tabulation of requirements within each paragraph to come up with a count of potential tests. I’ve included the spreadsheet we constructed to assist this process in this email. That sheet is pretty self-explanatory but if you have questions, I’m more than happy to answer. Our analysis was done by several different engineers, each of whom had slightly different ideas about how to count “tests” but the goal here was to produce an approximation, not a perfectly accurate list.

It's great someone took the time for this bottom-up approach which will help validate our initial assumptions.  
>  
>  
> One interesting result of this work was that the number of tests we came up with differed substantially from the estimates prepared recently by the W3C team for their “Open Web Platform Test Suite - Coverage Analysis and Cost Estimate”. We identified about 517 tests for the subsections that were in scope for us. This compares to about 2611 tests expected by the W3C team for those same sections – a difference factor of about 5. In no way do I mean to suggest that our estimates are better than those of the W3C team. We just took a different approach and came up with a different number. But, from the perspective of applying resources to developing the needed tests, this represents a pretty significant cost difference and since I’m looking at funding from within CableLabs to get this work done, my management would like to know the likely cost of the effort. They really want to know if we’re developing on the order of 500 tests or 2500 tests and I’m looking to validate my assumptions and analysis results.

First of all, thanks for stepping and offering significant help to this effort, this is much appreciated.

The process you describe above seems sound, and I was at first quite surprised by the important difference between the output of the two methodologies. That is, until I looked at the estimated time you consider an engineer is going to take to write a test: 8 hours. We've accounted for 1h to write a test and 15 minutes to review it.

This of course completely overturns the ratio and makes your approach more pessimistic than ours, probably looming closer to our worst case estimations than to our best case ones. Which makes sense as it is a rather complicated part of the spec.

That said, this begs the question (that you hint to above): what exactly is a test? Looking at your document it seems that what you're looking at is more of a "test page" than a test (I don't want to get too hung up on the naming part, just trying to agree on a common vocabulary here so we can move forward).
> I’ve attached a workbook that contains the “bottoms-up” spreadsheet we used to count up the anticipated number of tests and another small worksheet where I compared our test counts with those generated earlier by the W3C team. Here’s a short list of assumptions we made as we worked through the problem
>  
>  
> · We excluded tests of the IDL from both the W3C and CableLabs estimates under the assumption that the IDLHarness will generate IDL tests automatically.
> · We accounted for some tests around algorithms but believe that many algorithm steps, especially intermediate steps, do not require separate tests.
> · We subtracted out the number of existing, approved tests in the GitHub repository that were associated with our target sections in order to come up with a count of “remaining” tests to be developed.
> · We assumed that a suitable test harness and driver will be available to run the set of developed tests. I understand there’s significant work to be done on that infrastructure but that’s not part of this little exercise.
>  
> My request to the community at this point is to ask for your thoughts on the validity of our estimation approach and whether we’ve missed some substantial aspect of the problem in our analysis.
My overall feeling is you're in the right ballpark estimation-wise. You might want to factor in review costs. Note reviews would need to be done in the open, on GitHub, if they are going to be carried by another CableLabs engineer.

I think it would be useful to look at authoring tests for a single testable requirement, measuring the time taken to do so and sharing test and data here.

LMK if you have more questions.

Best,

--tobie
Received on Thursday, 1 August 2013 02:08:41 UTC