Re: Knowing which tests are in the repository from James Graham on 2013-08-16 (public-test-infra@w3.org from July to September 2013)

From: James Graham <james@hoppipolla.co.uk>
Date: Fri, 16 Aug 2013 10:51:26 +0100
To: public-test-infra@w3.org
Message-ID: <520DF61E.8050504@hoppipolla.co.uk>
On 16/08/13 01:15, Dirk Pranke wrote:
> On Thu, Aug 15, 2013 at 3:30 PM, Tobie Langel <tobie@w3.org
> <mailto:tobie@w3.org>> wrote:
>
>
>
>     On Thursday, August 15, 2013 at 10:26 PM, James Graham wrote:
>
>      > On 15/08/13 20:42, Tobie Langel wrote:
>      > > On Thursday, August 15, 2013 at 5:27 PM, James Graham wrote:
>      > > > and unusually long timeouts for testharness tests.
>      > >
>      > > Doesn't testharness already let you handle timeouts?
>      >
>      > Sure. But the general problem is this:
>      >
>      > I am writing a test runner for these tests. In order for this to
>     work I
>      > need to know at least which files are tests. But for a good
>      > implementation I also need various other pieces of data about the
>     tests.
>      > For example I need to know the timeouts because I want the runner to
>      > have it's own timeout in case the test causes the browser to hang or
>      > crash. This obviously needs to be at least as long as the test
>     timeout.
>      > Therefore the timeout in the test is relevant, but the fact that
>      > testharness.js will timeout itself isn't really relevant.
>
>     Can't you parse the test to get that data out?
>
>     Else couldn't we set a convention to add this info in a meta tag?
>
>
> I don't yet have enough experience running a wide variety of the tests
> in the repo, but what range of test speeds is considered acceptable? Is
> there some reason you really need to be able to customize test harness
> timeouts per-test?

The idea is to have a rather short default timeout and the ability to 
increase it only where required. This means that even in worst case 
scenarios (e.g. a bad patch that causes a hang in testharness.js) you 
aren't waiting an unreasonably long time for the testrun to finish.

>      > Now, it is true that for some of the information I could parse it
>     out of
>      > the files. But if I do that I am certainly going to store it in some
>      > intermediate format so that I don't have to redo the expensive
>     parsing
>      > step for each test run.
>
>     Is it really going to be that expensive?
>
>
> Parsing each test to extract fields, assuming you're doing this in a
> separate step from actually running the test, is indeed expensive. For
> example, if your test harness is parsing the test in python, that can
> easily be slower than actually running the test in the browser, meaning
> that we could end up doubling test execution time. At least, this is
> what I'm seeing in my efforts to get Blink running the tests.

I would expect something similar. HTML is far too difficult to parse to 
use as a real-time format for the metadata.

>      > So at some level I am going to end up with a
>      > manifest format. Since everyone else trying to run the tests in bulk
>      > will also need this it makes sense to share the format and the files.
>
>     Possibly.
>      > Moreover, the filename-only solution doesn't work. You don't load
>     files,
>      > you load URLs. A particular file can have different behavior
>     depending
>      > on non-path parts of the URL. For example the parser tests work
>      > differently depending on the query string that you pass in.
>
>     Pointers to these tests? This seems a rather isolated case for which
>     there might be a dedicated solution that would work as well.
>
>
> I would also like to see this. Historically in Blink we can accomplish
> such tests using iframes to resources or other script-based mechanisms;
> we've never needed a way to specify query strings as part of a test input.

I already pointed out that the parser tests are examples of this. 
According to the value of the run_type query parameter they either write 
the input data using a data url, or using document.write, or using 
document.write one character at a time, or using innerHTML. Now we could 
of course have four extra files per test, each loading the test with a 
query string in an iframe (which itself runs the actual tests in further 
iframes). However requiring this kind of per-instance manual work is 
going to strongly discourage people from writing this kind of more 
thorough test. I can attest from experience that the value when 
implementing of these tests working in more than one way was immense; 
the document.write variants exercised codepaths that were untouched by 
the data uri-based version, and were highly effective in finding bugs 
that we might otherwise have missed (or, at least, only seen after doing 
expensive reduction of broken sites). Therefore I am totally against 
anything that would discourage this kind of extra-mile testing in the 
future.

> In general, I think it's much easier to understand the layout of the
> repository if we follow naming conventions, rather than requiring
> lookups through a manifest. To that end, Blink and WebKit have always
> preferred the -expected or -ref convention for references over the
> "rel=match" metadata approach, but I think that ship has sailed anyway.

That was always a bad idea. One nice thing about reftests is that you 
can reuse a single ref for many tests. This is not only less work for 
the tester, but means that you can optimise your test runner to only 
take one screenshot per refurl. Requiring strict naming conventions 
defeats this.

> So, I'd strongly encourage naming conventions even if we can't get 100%
> accuracy that way and do end up needing a manifest for some things.

I think relying on naming conventions has a superficial appeal, but has 
many drawbacks when considered in more detail.
Received on Friday, 16 August 2013 09:52:52 UTC