- From: James Graham <james@hoppipolla.co.uk>
- Date: Tue, 27 Aug 2013 11:51:05 +0100
- To: Dirk Pranke <dpranke@chromium.org>
- CC: "public-test-infra@w3.org" <public-test-infra@w3.org>
On 23/08/13 18:48, Dirk Pranke wrote: > The way Blink and WebKit's test harness works (for their own checked-in > tests, not the W3C's), we walk a directory looking for files that have > specific file extensions and aren't in directories with particular > names. All (and only) the matches are tests; end of story. References > can be found either by filename convention or (in a few rare cases not > really used today) by parsing a reftest manifest. I think we really only > support the manifest format for feature completeness and under the > belief that we would need it sooner or later when importing the W3C's > tests. It was (and remains) a fairly controversial practice compared to > using filename conventions. > > We handle the timeout problem as follows: First, we expect tests to be > fast (sub-second), and we don't expect them to timeout regularly (since > running tests that timeout don't really give you much of a signal and > wastes a lot of time). > > Second, we pick a default timeout. In Blink, this is 6 seconds, which > works well for 99% (literally) of the tests on a variety of hardware > (this number could probably be a couple seconds higher or lower without > much impact), but the number is adjusted based on the build > configuration (debug builds are slower) and platform (android is slower) > . Second, We have a separate manifest-ish file for marking a subset of > tests as Slow, and they get a 30s timeout. In WebKit, we have a much > longer default timeout (30s) and don't use Slow markers at all. > > There is no build step, and no parsing of tests on the fly at test run > time (except as part of the actual test execution, of course). It works > well, and any delays caused by scanning for files or dealing with > timeouts is a small (1-3%) part of the total test run. It is worth noting that there are a few differences between running a testsuite that is specifically designed for one browser and a testsuite that is intended for use across multiple products. With a specifically-designed testsuite it is usually expected that all tests will pass. This is somewhat reasonable as one only writes tests for ones' own browser that correspond to implemented features in that browser. Even then, it is quite common to need some extra information in the tests to mark known failures corresponding to bugs that haven't been fixed yet. When one is importing tests, it isn't reasonable to expect all the tests to pass, or to all be for features that you have actually implemented. For a certain class of test, the only way of detecting a fail is to wait for a timeout. For example if you didn't implement setTimeout, and a test tried to check that setting a timer worked, you would have to wait until the harness timeout for the test to fail. If this is set to a very high value — like 30s for all tests — waiting so long would make the testsuite prohibitively slow. Therefore being able to choose an appropriate timeout for each test seems much more important for imported testsuites since they are much more likely to hit the slow cases than implementation-specific testsuites. Since tests are expected to fail, and since updating tests should be easy, test runners designed around the assumption "all tests must pass" need additional data stating which tests are, in fact, known failures. Putting this data inside the test files themselves is not very sane as it is hard to read/write these files automatically and likely to lead to merge conflicts. Such data will therefore have to go in some sort of external manifest, and keeping it up to date for a specific implementation probably implies an elaborate "build step" (a special testrun run that records the failures in a known-good build and updates the manifest somehow; I'm not clear on all the details here and indeed this seems like one of the principal challenges in running W3C tests on vendor infrastructure since the process I just described is both complex to implement and racy). If this is taken as a requirement, avoiding the part of the update process where you update metadata for files that changed since your last import seems like a relatively small win. If, on the other hand, you have some process in mind that avoids the need for a complex synchronization of the expected failures, I would be intrigued to hear it.
Received on Tuesday, 27 August 2013 10:52:47 UTC