- From: Robin Berjon <robin@w3.org>
- Date: Mon, 11 Feb 2013 16:47:39 +0100
- To: "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>
- CC: public-test-infra <public-test-infra@w3.org>
Hi all, a couple of weeks ago we had a meeting about testing. One of the things that came out of it was that it would helpful to get a feel for the coverage level that we have for specs, and for larger specs to have that coverage per section, along with other measures to contrast the number of tests with. I've now done this analysis for the HTML and Canvas specs (I would have done Microdata too, but it doesn't seem to have approved tests yet). You can see it here, but be warned that you might not understand it without reading the notes below: http://w3c-test.org/html-testsuite/master/tools/coverage/ I'm copying public-test-infra; in case anyone wants to do the same for other specs I'd be happy to collaborate. If people think it would be useful to provide such data on a regular basis, we can certainly automate it. Note that for this purpose having the data in one big repo would help. Some notes: • I used the master specs, which means that this data is actually for 5.1 rather than 5.0. I can of course run the same to target the 5.0 CR (and will). It makes no different to the script. • I'm not claiming that all the metrics shown are useful. I'm including them because they were reasonably easy to extract (the hard part here is actually figuring out what's a section in the spec's body). Mike suggested that "number of examples" could also be used, which I think is an idea worth exploring. • The metrics work this way: - number of words: I'm basically splitting on a simplistic idea of word boundary. I don't think it matters because we're not doing NLP. - RFC2119: I'm looking for both must and should, and giving them equal weight. It could be argued that one could disregard should, but it could equally be argued that any manner of optionality actually requires more testing. - algorithm steps: I'm counting "ol li". I think this is actually one of the most useful metrics. - IDL item: I remove empty lines, comments, lines that just close a structure (e.g. };) and then just count the lines. I could do something more complex based on a parser, but I don't think it would give different results. • Some parts are weird: I essentially remove every section that is marked as "non-normative". In some cases (e.g. the introduction) all subsections of a section are non-normative, but the section itself isn't marked that way. I'll fix my algorithm to further remove sections that are left just having a title. I'll also special-case things like references and acknowledgements that aren't marked as NN but should be removed. • The non-normative removal is rather simple too. Any section that flagged as non-normative, examples, IDL fragments (restated from the complete thing), "DOMintro" stuff gets removed. • I index specifications at a maximum section depth of 3 (this matches the directory depth used in the test suite). The first form on the page allows you to get a higher-level view. • I picked *completely* arbitrary thresholds for deciding whether the various metrics are flagged good or bad. You can change them in the form. • Canvas is looking good even with relatively stringent settings. HTML less so :) -- Robin Berjon - http://berjon.com/ - @robinberjon
Received on Monday, 11 February 2013 15:47:51 UTC