Re: Knowing which tests are in the repository from James Graham on 2013-08-23 (public-test-infra@w3.org from July to September 2013)

From: James Graham <james@hoppipolla.co.uk>
Date: Fri, 23 Aug 2013 11:47:28 +0100
To: Dirk Pranke <dpranke@chromium.org>
CC: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <52173DC0.1040400@hoppipolla.co.uk>
On 22/08/13 18:21, Dirk Pranke wrote:
> On Thu, Aug 22, 2013 at 9:55 AM, James Graham wrote:
>         Strictly speaking, one could say that -manual is unneeded, but
>         since I'd
>         prefer to stomp out as many manual tests as possible, I'm fine
>         w/ making
>         their names be uglier (and I do also like the clarity the naming
>         provides).
>
>
>     I don't see how else you would distinguish manual tests and helper
>     files.
>
>
> As per above, I'm not quite sure what all is a "helper file" to you. If
> you're talking about subresources in a page, I'd prefer that they be in
> dedicated directories called "resources" (or some such name) by
> themselves rather than mixed in with the tests. Are there other sorts of
> files (that might also have the same file extensions as tests)?

No, this is principally what I mean by helper files. I suppose it's 
possible that requiring a dedicated subdirectory could work, but again 
it seems quite burdensome for authors particularly in situations when a 
path can't contain a / e.g. (real example) new Worker(null) which 
requires a file called "null" in the same directory as the test. 
Obviously you *can* work around this by running the whole test in an 
iframe or new window, but that is much harder to do and means that the 
test author has to structure the entire test around this special case. 
It is quite likely that they will get to that case, realise how hard it 
would be to write the test, and not bother.

Also, the wins from this seem rather small; you still have to parse out 
the metadata from all the test files, which is too slow to do in 
realtime and will require some sort of preprocessing into a manifest for 
use on automation. Once this is an offline process it seems less 
important that it is super-quick. However maybe I will change my mind 
here once I actually implement manifest generation that parses the HTML :)

>         Is it too much to ask that we have similar names for either
>         testharness
>         tests or reftests so that you can distinguish which a test is
>         without
>         having to open the file? /me holds out a faint hope ...
>
>
>     I think it's too much effort to require that all testharness.js
>     tests have something specific in the filename. Reftests have to be
>     parsed to work out the reference anyway.
>
>
> Well, yeah, but that way you could at least not have to parse the
> testharness tests looking for references. Given that we have 10x the
> number of testharness tests as reftests in the web-platform-repo, this
> isn't a small thing.

But I think we need to parse the files to get out other metadata anyway. 
I still maintain that we need to solve the timeout problem.

> Okay, thanks for clarifying.
>
>     Since we already have cases where it is really required, and the
>     people who require it are typically advanced test authors, this
>     seems quite acceptable.
>
>
> We do? I haven't noticed any such cases yet, but it's quite likely I've
> missed them and I'd appreciate pointers.

Well in theory the following tests are all different:
http://w3c-test.org/web-platform-tests/master/html/syntax/parsing/test_tests1.html?run_type=uri
http://w3c-test.org/web-platform-tests/master/html/syntax/parsing/test_tests1.html?run_type=write
http://w3c-test.org/web-platform-tests/master/html/syntax/parsing/test_tests1.html?run_type=write_single
http://w3c-test.org/web-platform-tests/master/html/syntax/parsing/test_tests1.html?run_type=innerHTML

It actually appears that there's a bug in the submitted version so that 
run_type="write" is hardcoded. I will fix that.

Another case is something like 
http://w3c-test.org/web-platform-tests/master/dom/ranges/Range-compareBoundaryPoints.html

This generates a large number of tests from data and as such can be 
quite slow to run. One straightforward way to deal with the slowness 
would be to chunk up the tests based on query parameters. This seems 
simpler than requiring one file per test.

>         As far as timeouts go, I'm still not sold on specifying them at
>         all, or
>         at least specifying them regularly as part of the test input.
>         I'd rather
>         have a rule along the lines of "no input file should take more
>         than X
>         seconds to run" (obviously, details would qualify the class of
>         hardware
>         and browser used as a baseline for that). I'd suggest X be on
>         the order
>         of 1-2 seconds for a contemporary desktop production browser on
>         contemporary hardware. I would be fine w/ this being a
>         recommendation
>         rather than a requirement, though.
>
>
>     Well, there are a lot of issues here. Obviously very-long-running
>     tests can be problematic. On the other hand, splitting up tests
>     where they could be combined creates a lot of overhead during
>     execution. More importantly, some tests simply require long running
>     times. It isn't uncommon to have tests that delay resource loads to
>     ensure a particular order of events, or similar. Tests like these
>     intrinsically take more than a few seconds to run and so need a
>     longer timeout.
>
>     I don't think we can simply dodge this issue.
>
>
> I'm not trying to dodge the issue. I don't think Blink has any tests
> that intrinsically require seconds to run to schedule and load
> resources, though we do have some tests that do take seconds to run
> (usually because they're doing too much in one test, and sometimes
> because they're doing something computationally very expensive). I would
> be curious to see examples of tests that were intrinsically slow (and
> considered well-written) in the CSS repos. It's always good to have
> concrete examples to talk about.

I don't know about CSS, but there are certainly examples in web-platform 
tests

http://w3c-test.org/web-platform-tests/master/old-tests/submission/Opera/script_scheduling/082.html

to take one. There are also tests that load large resources, or 
intentionally feed large amounts of data over a websocket, or so other 
things that are naturally slow, often for IO reasons rather than 
computational reasons. There are also tests that are simply slow because 
they are large, such as the aforementioned range tests.

I know that timeouts are annoying and have a number of issues, but I 
don't think that anyone has proposed a serious alternative that doesn't 
amount to "don't write tests that are slow". Which isn't always possible.

> I'm not sure about your assertion that splitting up tests creates "a lot
> of overhead". Do you mean in test execution time, or configuration /
> test management overhead?

Principally in execution time. To take an extreme example, running one 
of the reflection testcases that produces tens of thousands of results 
in a few seconds would be untenable if each result had to be a seperate 
page load. To give a more practical example, as soon as I mentioned W3C 
tests to Mozilla test infrastructure maintainers, I got complaints about 
the old DOM tests that are one test per file and hence have a lot of 
overhead.

> Certainly, creating and executing each standalone test page has a
> certain amount of overhead (in Blink, this is on the order of a few
> milliseconds to ten on a desktop machine, not large but it does add up
> over thousands of tests). On the other hand, bundling a large number of
> individual assertions into a single testable unit has its own problems,
> so we almost always want a tradeoff in practice anyway.

Of course it is a tradeoff. There are also tradeoffs in implementation 
strategy; for example opening each top level test in a new browsing 
context is likely to help avoid randomness, but will also make running 
the tests slower compared to navigating a single tlbc.
Received on Friday, 23 August 2013 10:48:00 UTC