- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Tue, 22 Sep 2009 16:39:45 +0100
- To: Philippe Le Hegaret <plh@w3.org>
- CC: public-html@w3.org
Philippe Le Hegaret wrote: > A few of us got together recently with the idea of improving the state > of Web browser testing at W3C. Since this Group is discussing the > creation of an effort for the purpose of testing the HTML specification, > this is relevant here as well: > [...] (I assume you meant to link to http://omocha.w3.org/ somewhere in this email.) My first impression is that this sounds great! It seems to be focusing on what I see as perhaps the most important goal (improving interoperability between implementations), and perhaps the most important challenge (scaling the process to cope with the complexity of modern specs and the necessary depth of testing). Apologies for some long rambling thoughts: I like automation - if there's going to be hundreds or thousands of test cases, I expect the overall effort will be minimised if each test case is as simple as possible to write and review and run, even if it requires a great deal of automation tool support. That also means that once the tools are developed, adding a new test case is very cheap, so people have fewer excuses to not write tests. When writing some HTML5 canvas tests a while ago (<http://philip.html5.org/tests/canvas/suite/tests/>; I don't have much experience with writing other tests so my perspective is biased towards this), the approach I took was to eliminate almost all boilerplate from the hand-written input for each test, and move the complexity into a Python tool that converts them into executable code. So there's a single hand-written file, about ten thousand lines long, containing test case specifications like: - name: 2d.drawImage.3arg testing: - 2d.drawImage.defaultsource - 2d.drawImage.defaultdest images: - red.png - green.png code: | ctx.drawImage(document.getElementById('green.png'), 0, 0); ctx.drawImage(document.getElementById('red.png'), -100, 0); ctx.drawImage(document.getElementById('red.png'), 100, 0); ctx.drawImage(document.getElementById('red.png'), 0, -50); ctx.drawImage(document.getElementById('red.png'), 0, 50); @assert pixel 0,0 ==~ 0,255,0,255; @assert pixel 99,0 ==~ 0,255,0,255; @assert pixel 0,49 ==~ 0,255,0,255; @assert pixel 99,49 ==~ 0,255,0,255; expected: green using YAML syntax, giving: a name (in an arbitrary but useful hierarchy); a list of named spec sentences (described in a separate file) that this test is testing conformance to; a list of images to load before running the test; some JS code to execute, with some special syntax that means "the pixel at 0,0 must be approximately (+/- 2) equal to rgba(0,255,0,255)"; and then a specification of the expected output image, either the common keyword "green" or else some Python+Pycairo code that generates the image. (Reftest-style precise comparison is unsuitable for all the canvas tests: browsers have freedom in antialiasing and rounding etc, so their output won't precisely match the expected output image, hence the approximate tests of a small set of pixels.) Then there's a thousand lines of Python and JS to transform these into executable tests (each becomes a standalone file containing lots of information about the test, and another visually-simpler version for giving an overview of lots of tests simultaneously, and another visually-simplest version for easy pass/fail verification, and also a Mozilla Mochitest version) and to quickly collect results from browsers (it detects results automatically where possible, and otherwise requires the user to press 'y'/'n' if two images look similar/different) and then combine the results into <http://philip.html5.org/tests/canvas/suite/tests/results.html>. I think this approach has been quite effective so far - there's enough commonality between canvas tests that they can all fit into this framework without stretching it too much, and it made it easy for me to write hundreds of tests and to scan through the source file and update tests when the spec changed, and the Python tool was easily adapted(/hacked) to generate test files in a new format for Mozilla's automated testing system. Much of the tool code is of no value for anything except canvas tests, but that's okay because it's sufficiently valuable for canvas tests. One difficulty is that I'm the only person who can update tests, and also there's no process for review. Ideally it would be easy for other people to make and deploy changes without any involvement from me. Making use of a standardised centralised test suite system would be great, because I'm too lazy to write any of that myself. But the people editing tests should be editing the YAML source file, not any kind of boilerplateful processed output. So I guess the test suite system would have to incorporate the canvas-specific Python processing tool. That sounds potentially complex and nasty, but I don't see any other way to achieve the goal of maximally simplifying test case development, so maybe it's inevitable. That is probably the most serious design decision I see for the test suite system - should it have a single standard test case format for every test in the whole universe, or hundreds of different formats with their own processing tools that output a common format, or hundreds of different formats with no common format and each with their own test-runners, or something in between? and if it's anything other than the first option, how will the processing tools be written and executed and maintained? (With my current approach, there's also the difficulty that nobody but me understands the test format or the processing code, since they're somewhat idiosyncratic, but hopefully that could be resolved if there was some simplification and documentation...) A few random comments about / potential additions (if people agree) to <http://omocha.w3.org/wiki/wishes>: Is avoidance of test duplication a goal? e.g. if two people independently developed test suites for the same section of the spec, would it be best to just stick all the test cases into the official test suite (which is easy to do, and ensures as many requirements as possible are tested, though some will be tested twice (and every test needs to be reviewed and maintained)), or is it best to carefully merge them so every test case is distinct and necessary? The same situation occurs if e.g. a canvas test suite tests that videos can be drawn onto it, and a video test suite tests that it can be drawn onto a canvas. Duplicates aren't useful; the question is whether they are harmful, to an extent that makes de-duplication worthwhile. I have no data or experience to know the best balance, but it seems like something there should eventually be a clear policy on. It probably should always be possible to point people at a URL of a single test, that executes the test in their browser and lets them know if it's passed - that's very useful when submitting bug reports or discussing bugs. Ideally no test should rely on an external test-runner (though it should have one of those too). The "under review" -> {"approved", "rejected"} approach doesn't quite seem adequate, because specs will change (while in CR, or while in Rec with errata) and approved tests might become invalid, but it would be wasteful to send every single test back to "under review" just because of a single change in the spec. Maybe there needs to be some "approved and probably still valid since the spec changed but it shouldn't affect this" and "approved and possibly invalid since the spec changed and might affect this but nobody has reviewed it again carefully yet" states, or similar, for things that just need a quick check before being considered "approved" again. Performance is not stated as a desire (except as a consequence of parallelism), but it probably should be. E.g. originally the canvas tests were imported into Mozilla's automated system as a load of individual files, but they were merged into a single giant file containing all the tests to minimise the page-loading overhead. The faster the tests are, the more likely they are to be run, so it seems an important concern. Some tests are not strictly either script-verified or human-verified. E.g. most of the canvas tests use getImageData to automatically verify the output, but in some cases that might not work (a browser might not implement getImageData at all, or might have bugs that prevent it working in certain obscure situations). It doesn't seem helpful to penalise browsers for getImageData bugs when the test is meant to be testing something totally unrelated, so the tests dynamically fall back on human verification. It would be nice to retain support for that, instead of requiring tests to be predefined as either automatic or manual. > Regards, > > Philippe -- Philip Taylor pjt47@cam.ac.uk
Received on Tuesday, 22 September 2009 15:40:27 UTC