Test timeouts from James Graham on 2013-09-16 (public-test-infra@w3.org from July to September 2013)

From: James Graham <james@hoppipolla.co.uk>
Date: Mon, 16 Sep 2013 15:00:05 +0100
To: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <52370EE5.6030906@hoppipolla.co.uk>
This is a subject that has come up before, but I need a solution now so 
I think we should decide on a way forward.

Sometimes it's necessary to time a test out because it failed in some 
unexpected way that took a long time. This is a stronger requirement for 
cross-browser tests than tests for specific browsers' own test harnesses 
because the tests are "untrusted"; they may never have been run in a 
certain browser so may rely on e.g. some even that hasn't been 
implemented yet.

Simple fixed timeouts don't really work though. This is because not all 
hardware is created equal; something running in 1000ms on a modern 
desktop might take much longer running on a mobile device or an 
emulator. Or, it might not; essentially there are two reasons that tests 
can take a long time:

a) They do something expensive in hardware (the CPU, the IO devices, 
etc) where the performance depends strongly on the type of hardware.

b) They do something intrinsically slow like wait 20s for an 
event-source to produce an event, to ensure that the connection is not 
dropped in that time.

Traditionally it is assumed that all tests are slow for type a) reasons 
which is the conservative assumption since any extra delay that's added 
to deal with slower running of these tests will at worst make type b) 
tests slower to run, not broken.

There are a variety of ways to deal with type a) speed variations, but 
they mostly seem to boil down to having some kind of per-device 
multiplier that you apply to the default harness to get the slowdown on 
a particular device type, possibly with overrides on a per-test, 
per-device level. I think this is a fine approach and one that we need 
to ensure can be supported.

In previous discussions there has been some contention about how to set 
up the timeout in the first place, and how much author control there 
should be. At the moment there is full author control — i.e. a precise 
timeout in ms — via the setup() function in testharness.js. This 
approach has one critical disadvantage; it is hard for external tools to 
read the timeout. Since you might hang the whole browser it certainly 
needs to be possible for the test runner to kill the whole process after 
a suitable delay that must be at least as long as the test timeout.

To fix this issue and, for the moment, this issue alone, I have made a 
patch to move timeout specification from setup() to a <meta> element:

<meta name="timeout" content="test-timeout-in-ms">

There is a review for this at [1].

There is the further question of how much control authors should have 
over the timeouts of tests. Opinions on this so far have varied from 
"none at all" through "normal or slow" to "full control". I now have a 
litte empirical data from the repository so far to guide us here. [2] 
shows preliminary results of the testharness in gecko with all tests 
fixed to a 20s timeout. This shows around 100 timeouts from around 4000 
top level test files. A number of these are missing features, a number 
more are due to the non-support for PHP in server that I'm using. 
However we also see legitimate tests that are just slow to run; all of 
those that show some child test results before getting "timeout" are 
examples of this. In my opinion the fact that we have both tests that 
timeout due to implementation bugs and tests that timeout due to 
slowness is enough to scupper the zero-timeout approach. It's not really 
clear if a longer dual-timeout approach would work e.g. 5s for most 
tests (on desktop hardware) but <meta name=timeout content=long> to 
increase the timeout to 60s or longer for those tests. It at least seems 
plausible that this could cover a lot of cases although I suspect that 
we will still find edge cases where we want even longer or more finely 
controlled timeouts for certain tests. I also don't know if the 
performance impact of waiting 60s for a test that typically should 
finish in 6s is prohibitive.

Does anyone have any input here?

[1] https://critic.hoppipolla.co.uk/r/320
[2] http://hoppipolla.co.uk/410/results.html
Received on Monday, 16 September 2013 14:00:38 UTC