RE: Knowing which tests are in the repository

Folks –

Though we’re novices in this group, CableLabs has been developing automated testing frameworks for a while now and I’d like to offer a couple thoughts based on our past experience.  WRT: Dirk’s restatement of the 3 fundamental questions, I’d offer the following:

1) Do we want to require (or at least strongly recommend) file naming conventions for tests and references?
     
A planned set of directory structures and names is really useful for helping developers know where to put common use or special purpose files and it helps users of the tests find the information they're looking for.  A complex set of rules for naming individual files (like trying to describe what a test does in its filename) is difficult to enforce or verify and generally leads to a maintenance headache.  That kind of information is best saved within the file as metadata.  Also, if you end up renaming files over time, you generally  have a more difficult time following their history in the repo.

2) Do we want to require the use of a manifest as a source-level object in the repo (rather than something that could be generated via a build process)?

Speaking of metadata, our experience is that keeping metadata inside the test files (e.g., the html files) themselves is the best way to keep that test and metadata in sync.  If a separate manifest file is needed to support test runtime, then that should be generated from the test files with an automated tool.  Candidate metadata we've seen mentioned in this thread or would like to propose are:  Test ID (file name); test timeout; test type (harness/ref/manual/etc); nontest files (e.g., helper files); spec references.   

3) Do we want to allow tests that are specified only in a manifest (e.g., tests with query parameters) rather than being initiated from a non-manifest file?

I may be misunderstanding the intent of this question - sorry if so.  I took it to mean providing an environment where a test developer can write and run a test w/o manifest info.  I think that option should be supported.  Manifest constructs are often better suited for running lots of tests in an automated fashion and get in the way during individual test development.  

Regards,

Kevin Kershaw
CableLabs

---------------------

From: dpranke@google.com [mailto:dpranke@google.com] On Behalf Of Dirk Pranke
Sent: Thursday, August 22, 2013 11:22 AM
To: James Graham
Cc: public-test-infra@w3.org
Subject: Re: Knowing which tests are in the repository



On Thu, Aug 22, 2013 at 9:55 AM, James Graham <james@hoppipolla.co.uk> wrote:
On 22/08/13 17:45, Dirk Pranke wrote:
I mostly like this ... comments inline.

On Thu, Aug 22, 2013 at 9:31 AM, James Graham <james@hoppipolla.co.uk
<mailto:james@hoppipolla.co.uk>> wrote:

    A modified proposal:

    By default apply the following rules, in the order given:

    * Any file with a name starting with a . or equal to
    "override.manifest" is a helper file


Are there helper files other than manifests that we should be worrying
about? I'm thinking of things like .htaccess, .gitignore, etc. I would
probably say "is not a test"  (or possibly "can be ignored") rather than
"is a helper file".

Sure, I only reused "helper file" for this case because I couldn't think of a better term.

    * Any file with -manual in the name before the extension is a manual
    test.

    * Any html, xhtml or svg file that links to testharness.js is a
    testharness test

    * Any html, xhtml or svg file that has a file with the same name but
    the suffix -ref before the extension is a reftest file and the
    corresponding -ref file is a helper file.

    * Any html, xhtml or svg file that contains a link rel=match or link
    rel=mismatch is a reftest file.


Strictly speaking, one could say that -manual is unneeded, but since I'd
prefer to stomp out as many manual tests as possible, I'm fine w/ making
their names be uglier (and I do also like the clarity the naming provides).

I don't see how else you would distinguish manual tests and helper files.


As per above, I'm not quite sure what all is a "helper file" to you. If you're talking about subresources in a page, I'd prefer that they be in dedicated directories called "resources" (or some such name) by themselves rather than mixed in with the tests. Are there other sorts of files (that might also have the same file extensions as tests)?
 
Is it too much to ask that we have similar names for either testharness
tests or reftests so that you can distinguish which a test is without
having to open the file? /me holds out a faint hope ...

I think it's too much effort to require that all testharness.js tests have something specific in the filename. Reftests have to be parsed to work out the reference anyway.


Well, yeah, but that way you could at least not have to parse the testharness tests looking for references. Given that we have 10x the number of testharness tests as reftests in the web-platform-repo, this isn't a small thing.

I'm not sure why this is much effort beyond a simple script and a bulk rename (and some retraining of authors or a commit hook ...), but at any rate this is hardly a deal-killer to me.
 

    * Any other file is a helper file.

    These rules can be overridden by providing an override.manifest
    file. Such a file can contain a list of filenames to exclude from
    the normal processing above and a list of urls for tests, similar to
    my previous proposal. So for example one might have

    [exclude]
    foo.html

    [testharness]
    foo.html?subset=1
    foo.html?subset=2

    I am still not sure how to deal with timeouts. One option would be
    to put the overall timeout in a meta value rather than in the
    javascript, since this will be easier to parse out. For tests where
    this doesn't work due to strong constraints on the html, one could
    use the override.manifest as above (and also specify the timeout in
    the js). I can't say I am thrilled with this idea though.


Ignoring the issues around query-param based tests and timeouts, is
there a reason we'd want to allow exceptions at all apart from the fact
that we have a lot of them now? I.e., I'd suggest that we don't allow
exceptions for new tests and figure out if we can rename/restructure
existing tests to get rid of the exceptions.

The point of the exceptions is only the issues around query params and other exceptional circumstances. The point is not to allow deviations in cases that could conform to the scheme, but to allow flexibility where it is really required.

Okay, thanks for clarifying.
 
Since we already have cases where it is really required, and the people who require it are typically advanced test authors, this seems quite acceptable.

We do? I haven't noticed any such cases yet, but it's quite likely I've missed them and I'd appreciate pointers.
 

As far as timeouts go, I'm still not sold on specifying them at all, or
at least specifying them regularly as part of the test input. I'd rather
have a rule along the lines of "no input file should take more than X
seconds to run" (obviously, details would qualify the class of hardware
and browser used as a baseline for that). I'd suggest X be on the order
of 1-2 seconds for a contemporary desktop production browser on
contemporary hardware. I would be fine w/ this being a recommendation
rather than a requirement, though.

Well, there are a lot of issues here. Obviously very-long-running tests can be problematic. On the other hand, splitting up tests where they could be combined creates a lot of overhead during execution. More importantly, some tests simply require long running times. It isn't uncommon to have tests that delay resource loads to ensure a particular order of events, or similar. Tests like these intrinsically take more than a few seconds to run and so need a longer timeout.

I don't think we can simply dodge this issue.

I'm not trying to dodge the issue. I don't think Blink has any tests that intrinsically require seconds to run to schedule and load resources, though we do have some tests that do take seconds to run (usually because they're doing too much in one test, and sometimes because they're doing something computationally very expensive). I would be curious to see examples of tests that were intrinsically slow (and considered well-written) in the CSS repos. It's always good to have concrete examples to talk about.

I'm not sure about your assertion that splitting up tests creates "a lot of overhead". Do you mean in test execution time, or configuration / test management overhead? 

Certainly, creating and executing each standalone test page has a certain amount of overhead (in Blink, this is on the order of a few milliseconds to ten on a desktop machine, not large but it does add up over thousands of tests). On the other hand, bundling a large number of individual assertions into a single testable unit has its own problems, so we almost always want a tradeoff in practice anyway.

 

Received on Thursday, 22 August 2013 21:28:17 UTC