W3C home > Mailing lists > Public > public-test-infra@w3.org > January to March 2013

Re: Coverage analysis

From: Rebecca Hauck <rhauck@adobe.com>
Date: Tue, 26 Feb 2013 11:31:22 -0800
To: "Linss, Peter" <peter.linss@hp.com>, Robin Berjon <robin@w3.org>
CC: "public-test-infra@w3.org" <public-test-infra@w3.org>, "public-html-testsuite@w3.org" <public-html-testsuite@w3.org>, "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
Message-ID: <CD524585.2C341%rhauck@adobe.com>

On 2/11/13 2:26 PM, "Linss, Peter" <peter.linss@hp.com> wrote:

>On Feb 11, 2013, at 12:04 PM, Robin Berjon wrote:
>> {snip}
>> Here, we couldn't use the same script for all specs because it has to
>>understand specific conventions for things that are examples,
>>non-normative, etc. In many cases, we could probably handle this without
>>PhantomJS. For the HTML spec (and all specs derived directly from that
>>source) we're looking at such a markup nightmare (the sections aren't
>>marked up as such, you essentially have to resort to DOM Ranges to
>>extract content usefully) that PhantomJS really is the only option.
>> I think there's no reason to despair though. If the HTML spec could be
>>analysed, then others will be easier. For instance, all the required
>>information is clearly marked up. We should be able to have a small
>>number of spec "styles" that we can set up and use.
>> The output from that is spec-data-*.json. Assuming we can solve the
>>above issue (which we can, it's just a bit of work), this too can be
>>automated and dumped to a DB.
>Take a look at the spec parser I wrote for Shepherd[1]. It was designed
>to find all the anchors (to match test rel=help links against) but it
>also finds all the sections, identifies non-normative sections, and
>classifies anchor types ('section', 'dfn', 'abbr', 'other'). It finds all
>the sections in the HTML5 spec just fine (AFAICT), along with SVG and all
>the CSS specs I've thrown at it so far. It stores all the data in a DB
>(independent from Shepherd) and Shepherd has a JSON api for getting the
>spec data:
>Sections only:
>All Anchors:
>It should be fairly straightforward to extend this to gather the
>additional data you're scraping from the specs. My thinking was that we
>should host a common DB for all the spec data for the other tools to use.


I generated some reports for a couple of CSS specs (Transforms and
Backgrounds & Borders) and used some of the CSS infrastructure.  I still
used Robin's spec parser, but pulled the number of tests from our existing
DB (from the annotations on the specs).

A few changes to Robin's scripts to support this:

- (minor) Parse the ToC whether it's an ol or a ul - CSS specs use the
latter; HTML the former
- Bypass the test-per-section.json and pull the test counts directly from
the spec if it has these annotations.

Both methods can peacefully coexist now, but we may want to discuss
converging them (like if the other Wgs wanted to adopt annotate.js).

I sent a pull request with these changes:

In this pull request, I've added a separate css-index.html for the CSS
reports with the accompanying json files. I realize the html-testsuite
repo is probably not the logical home for these, but perhaps we can
discuss a more centralized place for all things coverage in the Testing
Task Force.

I had trouble building gh-pages on my fork of the entire html-testsuite,
so I temporarily have the CSS reports here until the pull request is
merged or they're moved elsewhere:


There are a few flaws in these reports (for example, the Backgrounds &
Borders report probably doesn't need to include changes, acknowledgements,
and indices).  I'll write up a list of ideas and things I noticed as I dug
deeper into this. Again, perhaps a good thing to address in the Task Force.

Robin, thanks again for getting this going! It was well documented and
straightforward to adapt, which is always appreciated. Now that we have
something functioning, it should be easy tweak & refine it to make this
more useful over time. I'm happy to help move it forward!

Received on Tuesday, 26 February 2013 19:32:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:34:08 UTC