Notes on test suite reorganisation from Robin Berjon on 2012-12-11 (public-html-testsuite@w3.org from December 2012)

From: Robin Berjon <robin@w3.org>
Date: Tue, 11 Dec 2012 13:03:41 +0100
To: public-html-testsuite@w3.org
Message-ID: <50C7211D.9080909@w3.org>
Hi all!

We've been talking about test suite reorganisation for a while and I 
thought it might be helpful if I put a concrete proposal out there so 
that we can all see what sticks and what doesn't, and hopefully come to 
some conclusion shortly.

Unfortunately I won't be able to make it to the meeting today due to 
speaking at a meetup at the same time (notably about getting people 
involved in testing — cue irony music); so I'll try to make these notes 
self-sufficient.

Before I dig into it, a few things to know:

• I made all the changes in a dedicated branch. Nothing is broken, 
nothing has been destroyed.

• This is just a proposal and of course everything is open to changes. I 
haven't done any of the complicated parts (such as actually moving the 
tests around) so I really am not committed to the layout I built.

You can see the results in the temp/robin branch on GitHub:

     https://github.com/w3c/html-testsuite/tree/temp/robin

I called this branch "temp" to make it clear that I reserve the right to 
delete it. So don't build anything atop it please, or you might end up 
with a broken repo. If you wish to make similar proposals, I encourage 
you to use the same scheme.

What I have done includes:

• I've moved the tests/harness and tests/reporting directories to the 
root to get them out of the way. I'm unsure what to do with those 
eventually, we need to figure out where best to place them.

• Similar thinking needs to be applied to common, images, fonts, etc. 
directories. If they are shared across all sub-suites, they'll need to 
be at the root (or straight under tests); if they are specific to 
sub-suites, then they probably should go deeper inside the tree.

• Inside tests, I made five directories: html5, html51, canvas2d, 
canvas2d2, microdata to reflect the various specs. Technically there 
might be a microdata2 as well, but there doesn't seem to be much motion 
there for now so that can wait.

• For each of those subsuites, I used the relevant specification to 
generate a directory tree. The rules I used for that are simple. The 
names of each subdirectory comes from the ID of the relevant section in 
the spec. I know that James was worried that those would not be very 
readable, but in looking at the result I find it to be rather easy to 
understand (YMMV, feedback welcome). The only sanitisation that the IDs 
seemed to have required has been replacing / with _. I'm interested in 
knowing if the result works fine on all FSs (I get no errors on OSX, I'm 
thinking of Windows in particular as a likely source of divergence 
here). Overall though, the IDs are quite regular.

Producing directories to the full depth of the HTML5 spec would in some 
cases lead to a rather deep hierarchy, so after a quick chat on IRC I 
stopped at three levels. When there were subsections and I stopped, I 
generated a small contains.json file there that captures the subtree. 
I'm unsure if it would be useful (I guess it could be used for a simpler 
mapping to the ToC perhaps in tools like PLH's) but we're getting it 
free anyway. You'll note that there are .gitkeep files in every 
directory. You can ignore them: they're there because git does not take 
empty directories into account, and that's the conventional file to 
include to make sure the tree is there (they can be nuked as content is 
added).

Note that even after content is added, we can still use an automated 
process to add sections in the tree as and if needed.

I think that covers all about the directory structure, comments are 
dearly welcome as always.

Another big topic is how to handle submissions and approved tests. There 
are several options:

     A) Use approved and submissions/Foo subdirectories
     B) Use pull requests
     C) Use a file that lists what's approved and what isn't

I think we should rule option (A) out outright. Tests should be moved 
around as little as possible, ideally never.

Option (B) is interesting, but my concern is getting a view of the 
entire set of submissions + approved tests. We *could* use the GH API to 
obtain the full list of pending PRs and extract the content accordingly, 
but that introduces reliance on the GH API beyond just git (which may 
not be a problem given http://gitlabhq.com/).

Option (C) is the simplest, though it runs the risk of someone 
forgetting to update the file listing the approved tests (this could 
however be made more obvious by listing content on both sides clearly, 
and possibly spamming this group with "pending submissions" every week).

Overall I have a slight preference for (C), but I could be convinced to 
go with (B), especially if integration with epic (or whatever) is 
particularly good there.

If we do go with (C), I would however suggest that we use JSON as the 
format rather than text, even if it's just for a dumb array of strings. 
The reason here is that I've seen how our text-based manifests get used, 
and in those and their processors I've seen:

• Unicode errors;
• BOM problems;
• EOL Win vs Unix problems;
• Lack of EOL on the last record which caused that record to be ignored; 
or conversely EOL on the last record that caused an empty record after 
it to be read.

In other words, pretty much every single classic error in handling text 
that can be made, will be made. You can screw up JSON too, but with 
libraries in any language doing it right for you, if that happens you 
probably should get shot.

Anyway, that's it for today's brain dump!

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Tuesday, 11 December 2012 12:03:54 UTC