Re: Knowing which tests are in the repository from Dirk Pranke on 2013-08-16 (public-test-infra@w3.org from July to September 2013)

From: Dirk Pranke <dpranke@chromium.org>
Date: Fri, 16 Aug 2013 12:33:23 -0700
To: James Graham <james@hoppipolla.co.uk>
Cc: public-test-infra <public-test-infra@w3.org>
Message-ID: <CAEoffTCrVrAiPjLD7TCiHTDfVy3DH=k+_a+ikOj3p2pOKYeUnw@mail.gmail.com>
On Fri, Aug 16, 2013 at 2:51 AM, James Graham <james@hoppipolla.co.uk>wrote:

> On 16/08/13 01:15, Dirk Pranke wrote:
>
>> On Thu, Aug 15, 2013 at 3:30 PM, Tobie Langel <tobie@w3.org
>> <mailto:tobie@w3.org>> wrote:
>>
>>
>>
>>     On Thursday, August 15, 2013 at 10:26 PM, James Graham wrote:
>>
>>      > On 15/08/13 20:42, Tobie Langel wrote:
>>      > > On Thursday, August 15, 2013 at 5:27 PM, James Graham wrote:
>>      > > > and unusually long timeouts for testharness tests.
>>      > >
>>      > > Doesn't testharness already let you handle timeouts?
>>      >
>>      > Sure. But the general problem is this:
>>      >
>>      > I am writing a test runner for these tests. In order for this to
>>     work I
>>      > need to know at least which files are tests. But for a good
>>      > implementation I also need various other pieces of data about the
>>     tests.
>>      > For example I need to know the timeouts because I want the runner
>> to
>>      > have it's own timeout in case the test causes the browser to hang
>> or
>>      > crash. This obviously needs to be at least as long as the test
>>     timeout.
>>      > Therefore the timeout in the test is relevant, but the fact that
>>      > testharness.js will timeout itself isn't really relevant.
>>
>>     Can't you parse the test to get that data out?
>>
>>     Else couldn't we set a convention to add this info in a meta tag?
>>
>>
>> I don't yet have enough experience running a wide variety of the tests
>> in the repo, but what range of test speeds is considered acceptable? Is
>> there some reason you really need to be able to customize test harness
>> timeouts per-test?
>>
>
> The idea is to have a rather short default timeout and the ability to
> increase it only where required. This means that even in worst case
> scenarios (e.g. a bad patch that causes a hang in testharness.js) you
> aren't waiting an unreasonably long time for the testrun to finish.


I understand the idea; as I said, Blink's test harness supports something
similar. What I was trying to ask, however, is something else: do we tend
to require tests to run in a particular amount of time? e.g., 10ms, 100ms,
10s, ... ?


>       > Now, it is true that for some of the information I could parse it
>>     out of
>>      > the files. But if I do that I am certainly going to store it in
>> some
>>      > intermediate format so that I don't have to redo the expensive
>>     parsing
>>      > step for each test run.
>>
>>     Is it really going to be that expensive?
>>
>>
>> Parsing each test to extract fields, assuming you're doing this in a
>> separate step from actually running the test, is indeed expensive. For
>> example, if your test harness is parsing the test in python, that can
>> easily be slower than actually running the test in the browser, meaning
>> that we could end up doubling test execution time. At least, this is
>> what I'm seeing in my efforts to get Blink running the tests.
>>
>
> I would expect something similar. HTML is far too difficult to parse to
> use as a real-time format for the metadata.


Of course, since we control the html we could require something simpler
such that grep would work, but even having to read in every file is
somewhat expensive.


>       > So at some level I am going to end up with a
>>      > manifest format. Since everyone else trying to run the tests in
>> bulk
>>      > will also need this it makes sense to share the format and the
>> files.
>>
>>     Possibly.
>>      > Moreover, the filename-only solution doesn't work. You don't load
>>     files,
>>      > you load URLs. A particular file can have different behavior
>>     depending
>>      > on non-path parts of the URL. For example the parser tests work
>>      > differently depending on the query string that you pass in.
>>
>>     Pointers to these tests? This seems a rather isolated case for which
>>     there might be a dedicated solution that would work as well.
>>
>>
>> I would also like to see this. Historically in Blink we can accomplish
>> such tests using iframes to resources or other script-based mechanisms;
>> we've never needed a way to specify query strings as part of a test input.
>>
>
> I already pointed out that the parser tests are examples of this.
> According to the value of the run_type query parameter they either write
> the input data using a data url, or using document.write, or using
> document.write one character at a time, or using innerHTML. Now we could of
> course have four extra files per test, each loading the test with a query
> string in an iframe (which itself runs the actual tests in further
> iframes). However requiring this kind of per-instance manual work is going
> to strongly discourage people from writing this kind of more thorough test.
> I can attest from experience that the value when implementing of these
> tests working in more than one way was immense; the document.write variants
> exercised codepaths that were untouched by the data uri-based version, and
> were highly effective in finding bugs that we might otherwise have missed
> (or, at least, only seen after doing expensive reduction of broken sites).
> Therefore I am totally against anything that would discourage this kind of
> extra-mile testing in the future.


I haven't looked at your response in detail yet, but so far I don't
understand this objection (no need to reply to this yet; I will read
further and then respond). That said, I wouldn't want to encourage
something that requires excessive duplication of files, or, worse, manual
work.


>  In general, I think it's much easier to understand the layout of the
>> repository if we follow naming conventions, rather than requiring
>> lookups through a manifest. To that end, Blink and WebKit have always
>> preferred the -expected or -ref convention for references over the
>> "rel=match" metadata approach, but I think that ship has sailed anyway.
>>
>
> That was always a bad idea. One nice thing about reftests is that you can
> reuse a single ref for many tests. This is not only less work for the
> tester, but means that you can optimise your test runner to only take one
> screenshot per refurl. Requiring strict naming conventions defeats this.


I don't think the tradeoffs are quite that clear. There are lots of
advantages to having the reference be obvious in the file system, and it's
not clear to me how many references are reused such that this is really a
big performance win. But, as I said, I'm not trying to re-open this
particular debate.

So, I'd strongly encourage naming conventions even if we can't get 100%
>
>> accuracy that way and do end up needing a manifest for some things.
>>
>
> I think relying on naming conventions has a superficial appeal, but has
> many drawbacks when considered in more detail.
>

Apart from the ones described above, it's not obvious to me that a naming
convention like "all html files that aren't tests belong in a directory
named ./resources, and no tests should be in that subdirectory" has
particularly significant drawbacks? The only one that leaps out at me is if
you want to use one test as a reference for another test, but the rule I
just wrote doesn't actually preclude that. Similarly, being able to easily
identify from name or location which tests were jstests, or reftest, or
especially manual/pixel tests, would be tremendously useful.

-- Dirk
Received on Friday, 16 August 2013 19:34:11 UTC