- From: James Graham <james@hoppipolla.co.uk>
- Date: Thu, 01 Aug 2013 12:15:18 +0100
- To: Tobie Langel <tobie@w3.org>
- Cc: Kevin Kershaw <K.Kershaw@cablelabs.com>, public-test-infra <public-test-infra@w3.org>, <public-html-media@w3.org>, "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>, Takashi Hayakawa <T.Hayakawa@cablelabs.com>, Brian Otte <B.Otte@cablelabs.com>, Nishant Shah <N.Shah@cablelabs.com>
(apologies to people on public-html-testsuite who now have this reply many times.) On 2013-08-01 03:08, Tobie Langel wrote: > On Wednesday, July 31, 2013 at 12:35 AM, Kevin Kershaw wrote: >> First off, our team is specifically looking at building tests for >> designated subsections of HTML5 section 4.8. We originally identified >> video, audio, track, and media elements in our scope but added the >> source element and Dimension Attributes because of the tight coupling >> we see between these. Also, we’ve excluded some Media Element >> subsections (e.g. MediaControl) for our initial work. We started a >> “bottom-up” analysis of the target sections, working to identify what >> looked to us to be “testable” requirements in the spec. The >> subsections of the spec itself divide up pretty nicely by individual >> paragraphs. That is, each paragraph usually lists one or more >> potential test conditions.. We did some basic tabulation of >> requirements within each paragraph to come up with a count of >> potential tests. I’ve included the spreadsheet we constructed to >> assist this process in this email. That sheet is pretty >> self-explanatory but if you have questions, I’m more than happy to >> answer. Our analysis was done by several different engineers, each of >> whom had slightly different ideas about how to count “tests” but the >> goal here was to produce an approximation, not a perfectly accurate >> list. > It's great someone took the time for this bottom-up approach which > will help validate our initial assumptions. I don't know if it's intentional, but this spreadsheet hasn't been forwarded to public-test-infra. For the purposes of comparison, there is already an submission of a large number of media tests written by Simon Pieters at Opera [1]. I'm mot sure if it covers exactly the same section of the spec that you are interested in, but it contains 672 files, which is > 672 tests (some files contain more than one test, and there seem to be relatively few support files). It does contain some testing of the IDL sections, but this makes sense as a) it likely predates idlharness.js and b) idlharness.js can't test everything. One useful thing that you could do would be to review these tests as a first step; this would have several positive effects; it would allow you to compare your methodology for estimation to an existing submission, it would allow you to reduce the number of tests that you have to write, and would provide a good guide to the style of tests written by someone with a great deal of experience testing browser products and using the W3C infrastructure. If you do decide to go ahead with this review, I would strongly suggest that you consider using the critic tool [2], since the submission is rather large, and critic has a number of features that will make this easier, notably the ability to mark which files have been reviewed and which issues have been addressed. On the other hand if you prefer to use the github UI that is OK as well. To review media tests use critic you will need to set up a filter marking yourself as reviewer for "html/semantics/embedded-content-0/media-elements/". If you need help with critic (or indeed anything else), please ask me on #testing. > The process you describe above seems sound, and I was at first quite > surprised by the important difference between the output of the two > methodologies. That is, until I looked at the estimated time you > consider an engineer is going to take to write a test: 8 hours. We've > accounted for 1h to write a test and 15 minutes to review it. I would be interested to know what kind of thing you are thinking of when you talk about a "test". Is it a normal javascript/reftest of the kind that we are used to running on desktop browsers? I know that sometimes tests for consumer devices are more complex to write because there are extra requirements about running on production hardware + etc. Perhaps this kind of difference could account for the very different estimates of time per test? >> · We excluded tests of the IDL from both the W3C and CableLabs >> estimates under the assumption that the IDLHarness will generate IDL >> tests automatically. idlharnes.js can't autogenerate all interesting things. But it might be a reasonable first approximation. >> · We accounted for some tests around algorithms but believe that many >> algorithm steps, especially intermediate steps, do not require >> separate tests. Algorithms are black boxes; the requirement is that the UA behaviour is black-box indistinguishable from the algorithm. But this makes it very difficult to tell how many tests are needed; for example the HTML parser section of the spec is basically "parsers must act as if they follow the following algorithm: [huge multi-step state machine].". This requires thousands of tests. There are also tests that one can write that correspond to the ordering of steps in algorithms rather than the explicit steps themselves. For example if one has an algorithm that first measures the length of some input list and then does something for each index up to the initial measurement, one can write tests to ensure that the measurement happens before the list is accessed, that the expected thing happens if the list is mutated to be longer, or shorter, than the initial measurement, during iteration, and so on. >> · We subtracted out the number of existing, approved tests in the >> GitHub repository that were associated with our target sections in >> order to come up with a count of “remaining” tests to be developed. In this case there are a large number of unapproved tests, as discussed above. >> · We assumed that a suitable test harness and driver will be >> available to run the set of developed tests. I understand there’s >> significant work to be done on that infrastructure but that’s not part >> of this little exercise. Yeah. In particular browser vendors typically have their own automation harnesses. [1] https://github.com/w3c/web-platform-tests/pull/93 [2] https://critic.hoppipolla.co.uk/r/74
Received on Thursday, 1 August 2013 11:15:47 UTC