Re: Review of tests upstreamed by implementors

WebKit's experiences (with our own tests, not the W3C tests) mirror James'
and Ms2ger's.

A few more comments interleaved ...

On Thu, Mar 21, 2013 at 12:18 PM, Ms2ger <> wrote:

> FWIW, a few notes on my experience running W3C tests in Mozilla's
> automation:
> On 03/21/2013 02:11 PM, James Graham wrote:
>> Assuming that implementors actually want to import and run the tests,
>> there are a number of practical issues that they face. The first is
>> simply that they must sync the external repository with the one in which
>> they keep their tests. That's pretty trivial if you run git and pretty
>> much a headache if you don't. So for most vendors at the moment it's a
>> headache.
> I've written a script to pull the tests into our HG repository; this is
> pretty trivial for us too.
To answer Robin's question about how hard this would be w/ svn (or
something else) instead of Git or Hg, obviously, the problem can be solved,
it's just harder and more annoying. Chromium actually has a separate
toolset that can pull checkouts from multiple repositories (across both svn
and git), but that is not used by other WebKit ports and I wouldn't
necessarily recommend it for this situation.

>  Once you have imported the tests, it must be possible to tell which
>> files you should actually load and what system should be used to get
>> results (i.e., given a is it a reftest, is it a testharness test, is it
>> a manual test, is it a support file? Is the url to load the file
>> actually the url to the test or is there a query/fragment part? Is there
>> an optional query/fragment part that changes the details of the test?).
>> There have been several suggestions for solutions to these problems, but
>> there is no actual solution at the moment.
> The solution we use is the MANIFEST files checked into the repositories;
> there's documentation at [2].
We generally follow naming conventions for this, rather than a manifest
file. A script that can bridge this is probably fine.

>  Many vendor's systems are designed around the assumption that "all tests
>> must pass" and, for the rare cases where tests don't pass, one is
>> expected to manually annotate the test as failing. This is problematic
>> if you suddenly import 10,000 tests for a feature that you haven't
>> implemented yet. Or even 100 tests of which 27 fail. I don't have a good
>> solution for this other than "don't design your test system like that"
>> (which is rather late). I presume the answer will look something like a
>> means of auto-marking tests as expected-fail on their first run after
>> import.
> Tools save us here as well. It's not yet as easy as I'd like, but it
> involves not all that much more than running the tests and running a script
> on the output.


>  We also have the problem that many of the tests simply won't run in
>> vendor's systems. Tests that require an extra server to be set up (e.g.
>> websockets tests) are a particular problem, but they are rare. More
>> problematic is that many people can't run tests that depend on
>> Apache+PHP (because they run all the servers on the individual test node
>> and don't have Apache+PHP in that environment). Unless everyone is happy
>> to deploy something as heavyweight as Apache+PHP, we may need to
>> standardise on a diffferent solution for tests that require custom
>> server-side logic. Based on previous discussions, this would likely be a
>> custom Python-based server, with special features for testing (I believe
>> Chrome/WebKit already have something like this?).
> I don't expect Apache+PHP to work for Mozilla; a custom Python server
> would probably be workable.

WebKit uses either Apache or LigHTTPd (on Chromium Windows), and we have a
mixture of perl, python, and php server side scripts that get executed.

Someone (Tobie?) suggested that maybe we should be be using server-side
javascript a la Node. One problem with this is that Node depends on V8, and
for fairly obvious reasons this might be unappealing to some other vendors

PHP has the advantage that it is very simple and (by far) the most
prevalent server-side scripting language. It has the significant
disadvantage that you can pretty much *only* run it under a server like
Apache or IIS. Python would be a fine compromise, as there are lots of http
servers capable of running python scripts via engines of various
heavy-weighted-ness :).

>  One final issue (that I can think of right now ;) is that it must be
>> possible for everyone to *run* the tests and get results out. This
>> should in theory be rather easy since one can implement a custom
>> testharnessreport.js for javascript tests, and people already know how
>> to run reftests. But sometimes the details of people's testing systems
>> are very specialised in strange ways so this can be a larger barrier
>> than you might assume.
> I didn't really hit problems here.
> The one other issue we have is tests timing out; I haven't been able to
> figure out yet if that's due to test, test harness or implementation bugs.
WebKit's current infrastructure is optimized for running all of the tests
on a single machine concurrently across multiple processes. This can
introduce a significant amount of instability due to resource contention
and load on a machine, so it's something to watch out for in tests. We do
not currently have the infrastructure to trivially distribute tests across
multiple machines as part of a single test run (it hasn't really been

I'm not sure what exactly is meant by *run* the tests and get results out,
I'd probably want to understand requirements and/or use cases here better.

In particular, in WebKit, we use a customized test driver executable to run
our tests (not the stock/shipping web browsers) and so getting something to
work in that framework is probably a requirement for us; getting tests that
somehow require a full web browser would be harder and might be a
non-starter (but is certainly open to discussion).

-- Dirk

> Ms2ger
>  [1]
>> Avoiding_intermittent_oranges<>
> [2]**test-runner/src/**
> 3d9052f852abf69f8c449aebfe2038**34f1cfca55/manifests.txt?at=**default<>

Received on Thursday, 21 March 2013 22:24:24 UTC