- From: Robin Berjon <robin@w3.org>
- Date: Mon, 11 Feb 2013 16:47:39 +0100
- To: "'public-html-testsuite@w3.org'" <public-html-testsuite@w3.org>
- CC: public-test-infra <public-test-infra@w3.org>
Hi all,
a couple of weeks ago we had a meeting about testing. One of the things
that came out of it was that it would helpful to get a feel for the
coverage level that we have for specs, and for larger specs to have that
coverage per section, along with other measures to contrast the number
of tests with.
I've now done this analysis for the HTML and Canvas specs (I would have
done Microdata too, but it doesn't seem to have approved tests yet).
You can see it here, but be warned that you might not understand it
without reading the notes below:
http://w3c-test.org/html-testsuite/master/tools/coverage/
I'm copying public-test-infra; in case anyone wants to do the same for
other specs I'd be happy to collaborate. If people think it would be
useful to provide such data on a regular basis, we can certainly
automate it. Note that for this purpose having the data in one big repo
would help.
Some notes:
• I used the master specs, which means that this data is actually for
5.1 rather than 5.0. I can of course run the same to target the 5.0 CR
(and will). It makes no different to the script.
• I'm not claiming that all the metrics shown are useful. I'm including
them because they were reasonably easy to extract (the hard part here is
actually figuring out what's a section in the spec's body). Mike
suggested that "number of examples" could also be used, which I think is
an idea worth exploring.
• The metrics work this way:
- number of words: I'm basically splitting on a simplistic idea of
word boundary. I don't think it matters because we're not doing NLP.
- RFC2119: I'm looking for both must and should, and giving them
equal weight. It could be argued that one could disregard should, but it
could equally be argued that any manner of optionality actually requires
more testing.
- algorithm steps: I'm counting "ol li". I think this is actually one
of the most useful metrics.
- IDL item: I remove empty lines, comments, lines that just close a
structure (e.g. };) and then just count the lines. I could do something
more complex based on a parser, but I don't think it would give
different results.
• Some parts are weird: I essentially remove every section that is
marked as "non-normative". In some cases (e.g. the introduction) all
subsections of a section are non-normative, but the section itself isn't
marked that way. I'll fix my algorithm to further remove sections that
are left just having a title. I'll also special-case things like
references and acknowledgements that aren't marked as NN but should be
removed.
• The non-normative removal is rather simple too. Any section that
flagged as non-normative, examples, IDL fragments (restated from the
complete thing), "DOMintro" stuff gets removed.
• I index specifications at a maximum section depth of 3 (this matches
the directory depth used in the test suite). The first form on the page
allows you to get a higher-level view.
• I picked *completely* arbitrary thresholds for deciding whether the
various metrics are flagged good or bad. You can change them in the form.
• Canvas is looking good even with relatively stringent settings. HTML
less so :)
--
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Monday, 11 February 2013 15:47:51 UTC