Re: Knowing which tests are in the repository from Dirk Pranke on 2013-08-27 (public-test-infra@w3.org from July to September 2013)

From: Dirk Pranke <dpranke@chromium.org>
Date: Tue, 27 Aug 2013 11:43:43 -0700
To: James Graham <james@hoppipolla.co.uk>
Cc: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <CAEoffTBF5DJ-F53CF0=wiFJRHycEetPOBGujcQY0LPahvygKVw@mail.gmail.com>
On Tue, Aug 27, 2013 at 3:51 AM, James Graham <james@hoppipolla.co.uk>wrote:

> On 23/08/13 18:48, Dirk Pranke wrote:
>
>  The way Blink and WebKit's test harness works (for their own checked-in
>> tests, not the W3C's), we walk a directory looking for files that have
>> specific file extensions and aren't in directories with particular
>> names. All (and only) the matches are tests; end of story. References
>> can be found either by filename convention or (in a few rare cases not
>> really used today) by parsing a reftest manifest. I think we really only
>> support the manifest format for feature completeness and under the
>> belief that we would need it sooner or later when importing the W3C's
>> tests. It was (and remains) a fairly controversial practice compared to
>> using filename conventions.
>>
>> We handle the timeout problem as follows: First, we expect tests to be
>> fast (sub-second), and we don't expect them to timeout regularly (since
>> running tests that timeout don't really give you much of a signal and
>> wastes a lot of time).
>>
>> Second, we pick a default timeout. In Blink, this is 6 seconds, which
>> works well for 99% (literally) of the tests on a variety of hardware
>> (this number could probably be a couple seconds higher or lower without
>> much impact), but the number is adjusted based on the build
>> configuration (debug builds are slower) and platform (android is slower)
>> . Second, We have a separate manifest-ish file for marking a subset of
>> tests as Slow, and they get a 30s timeout. In WebKit, we have a much
>> longer default timeout (30s) and don't use Slow markers at all.
>>
>> There is no build step, and no parsing of tests on the fly at test run
>> time (except as part of the actual test execution, of course). It works
>> well, and any delays caused by scanning for files or dealing with
>> timeouts is a small (1-3%) part of the total test run.
>>
>
> It is worth noting that there are a few differences between running a
> testsuite that is specifically designed for one browser and a testsuite
> that is intended for use across multiple products.


These are good points. In WebKit's case, there really isn't "one browser"
as different ports are really different implementations with different sets
of functionality (features may be compiled in or not), and so even inside
WebKit it is common for lots of tests to not pass on some ports. The
WebKit/Blink infrastructure addresses this in two ways:

1) We use the same manifesty-file to indicate whether tests are expected to
fail or not. You can list directories as well as files, so that if you
don't support webaudio, for example, skipping that is one line. (There are
other hooks for such things, including compile-time and runtime feature
detection that can result in skipping tests, but those are less important).

2) In addition, the WebKit approach to testing is really more designed for
regression testing than conformance testing. This means that we are as much
(if not more) concerned with detecting changes in behavior as we are
detecting whether or not a test succeeds. If a test fails but produces
output, we can capture that output and compare subsequent tests to see if
the test is running "as expected". This is important for testharness-based
tests where you might have 20 assertions and it's important to know if 18
of them are passing or 10 of them are (rather than all-or-nothing).

In neither case do we track things by modifying the tests themselves. I
agree with you that modifying the tests themselves for this seems like it
would be a bad idea.


>  Keeping [ a list of expected failures ] up to date for a specific
> implementation probably implies an elaborate "build step" (a special
> testrun run that records the failures in a known-good build and updates the
> manifest somehow; I'm not clear on all the details here and indeed this
> seems like one of the principal challenges in running W3C tests on vendor
> infrastructure since the process I just described is both complex to
> implement and racy). If this is taken as a requirement, avoiding the part
> of the update process where you update metadata for files that changed
> since your last import seems like a relatively small win. If, on the other
> hand, you have some process in mind that avoids the need for a complex
> synchronization of the expected failures, I would be intrigued to hear it.
>

There is definitely a challenge in tracking two moving repos and keeping
them in sync, and as you say if you have such a phase where you need to
track and update a list of expected failures, you also have the option to
do other things in that phase (and I don't know how to get rid of that
phase altogether either, given the requirements we both acknowledge).

That said, I think there's still many advantages in keeping that phase as
simple and limited as possible. The less work that phase does, the more
understandable the overall system is and the fewer things we have to keep
in sync.

-- Dirk
Received on Tuesday, 27 August 2013 18:44:31 UTC