Re: Review of tests upstreamed by implementors from Dirk Pranke on 2013-03-21 (public-test-infra@w3.org from January to March 2013)

From: Dirk Pranke <dpranke@chromium.org>
Date: Thu, 21 Mar 2013 15:23:35 -0700
To: Ms2ger <ms2ger@gmail.com>
Cc: James Graham <jgraham@opera.com>, Robin Berjon <robin@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <CAEoffTCjwv_N8Y2xG117xDkYZ0Om2nJ4EWGU1kphMQcX3j1MXg@mail.gmail.com>
WebKit's experiences (with our own tests, not the W3C tests) mirror James'
and Ms2ger's.

A few more comments interleaved ...


On Thu, Mar 21, 2013 at 12:18 PM, Ms2ger <ms2ger@gmail.com> wrote:

> FWIW, a few notes on my experience running W3C tests in Mozilla's
> automation:
>
>
> On 03/21/2013 02:11 PM, James Graham wrote:
>
>> Assuming that implementors actually want to import and run the tests,
>> there are a number of practical issues that they face. The first is
>> simply that they must sync the external repository with the one in which
>> they keep their tests. That's pretty trivial if you run git and pretty
>> much a headache if you don't. So for most vendors at the moment it's a
>> headache.
>>
>
> I've written a script to pull the tests into our HG repository; this is
> pretty trivial for us too.
>
>
To answer Robin's question about how hard this would be w/ svn (or
something else) instead of Git or Hg, obviously, the problem can be solved,
it's just harder and more annoying. Chromium actually has a separate
toolset that can pull checkouts from multiple repositories (across both svn
and git), but that is not used by other WebKit ports and I wouldn't
necessarily recommend it for this situation.



>
>  Once you have imported the tests, it must be possible to tell which
>> files you should actually load and what system should be used to get
>> results (i.e., given a is it a reftest, is it a testharness test, is it
>> a manual test, is it a support file? Is the url to load the file
>> actually the url to the test or is there a query/fragment part? Is there
>> an optional query/fragment part that changes the details of the test?).
>> There have been several suggestions for solutions to these problems, but
>> there is no actual solution at the moment.
>
> The solution we use is the MANIFEST files checked into the repositories;
> there's documentation at [2].
>
>
We generally follow naming conventions for this, rather than a manifest
file. A script that can bridge this is probably fine.


>
>  Many vendor's systems are designed around the assumption that "all tests
>> must pass" and, for the rare cases where tests don't pass, one is
>> expected to manually annotate the test as failing. This is problematic
>> if you suddenly import 10,000 tests for a feature that you haven't
>> implemented yet. Or even 100 tests of which 27 fail. I don't have a good
>> solution for this other than "don't design your test system like that"
>> (which is rather late). I presume the answer will look something like a
>> means of auto-marking tests as expected-fail on their first run after
>> import.
>>
>
> Tools save us here as well. It's not yet as easy as I'd like, but it
> involves not all that much more than running the tests and running a script
> on the output.


Yup.


>
>
>  We also have the problem that many of the tests simply won't run in
>> vendor's systems. Tests that require an extra server to be set up (e.g.
>> websockets tests) are a particular problem, but they are rare. More
>> problematic is that many people can't run tests that depend on
>> Apache+PHP (because they run all the servers on the individual test node
>> and don't have Apache+PHP in that environment). Unless everyone is happy
>> to deploy something as heavyweight as Apache+PHP, we may need to
>> standardise on a diffferent solution for tests that require custom
>> server-side logic. Based on previous discussions, this would likely be a
>> custom Python-based server, with special features for testing (I believe
>> Chrome/WebKit already have something like this?).
>>
>
> I don't expect Apache+PHP to work for Mozilla; a custom Python server
> would probably be workable.


WebKit uses either Apache or LigHTTPd (on Chromium Windows), and we have a
mixture of perl, python, and php server side scripts that get executed.

Someone (Tobie?) suggested that maybe we should be be using server-side
javascript a la Node. One problem with this is that Node depends on V8, and
for fairly obvious reasons this might be unappealing to some other vendors
:).

PHP has the advantage that it is very simple and (by far) the most
prevalent server-side scripting language. It has the significant
disadvantage that you can pretty much *only* run it under a server like
Apache or IIS. Python would be a fine compromise, as there are lots of http
servers capable of running python scripts via engines of various
heavy-weighted-ness :).


>
>  One final issue (that I can think of right now ;) is that it must be
>> possible for everyone to *run* the tests and get results out. This
>> should in theory be rather easy since one can implement a custom
>> testharnessreport.js for javascript tests, and people already know how
>> to run reftests. But sometimes the details of people's testing systems
>> are very specialised in strange ways so this can be a larger barrier
>> than you might assume.
>>
>
> I didn't really hit problems here.
>
> The one other issue we have is tests timing out; I haven't been able to
> figure out yet if that's due to test, test harness or implementation bugs.
>
>
WebKit's current infrastructure is optimized for running all of the tests
on a single machine concurrently across multiple processes. This can
introduce a significant amount of instability due to resource contention
and load on a machine, so it's something to watch out for in tests. We do
not currently have the infrastructure to trivially distribute tests across
multiple machines as part of a single test run (it hasn't really been
needed).

I'm not sure what exactly is meant by *run* the tests and get results out,
I'd probably want to understand requirements and/or use cases here better.

In particular, in WebKit, we use a customized test driver executable to run
our tests (not the stock/shipping web browsers) and so getting something to
work in that framework is probably a requirement for us; getting tests that
somehow require a full web browser would be harder and might be a
non-starter (but is certainly open to discussion).

-- Dirk


> HTH
> Ms2ger
>
>  [1]
>> https://developer.mozilla.org/**en-US/docs/Mozilla/QA/**
>> Avoiding_intermittent_oranges<https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges>
>>
>
> [2] https://bitbucket.org/ms2ger/**test-runner/src/**
> 3d9052f852abf69f8c449aebfe2038**34f1cfca55/manifests.txt?at=**default<https://bitbucket.org/ms2ger/test-runner/src/3d9052f852abf69f8c449aebfe203834f1cfca55/manifests.txt?at=default>
>
>
Received on Thursday, 21 March 2013 22:24:24 UTC