Re: Mismatch between CSS and web-platform-tests semantics for reftests

On Sep 3, 2014, at 6:41 AM, James Graham <james@hoppipolla.co.uk> wrote:

> On 20/08/14 01:22, Peter Linss wrote:
> 
>>> Are these features something that any actual implementation is 
>>> running? As far as I can tell from the documentation, Mozilla 
>>> reftests don't support this feature, and I guess from Dirke's 
>>> response that Blink/WebKit reftests don't either. That doesn't 
>>> cover all possible implementations of course.
>> 
>> I'm actually in the middle of a big cleanup of our test harness and
>> it will support this feature when I'm done (so far we haven't been
>> able to represent the situation in our manifest files properly, I'm
>> fixing that too).

This is now online. In our manifest files, we now list "reference groups" separated by semicolons, within each group, references are separated by commas. A test must match any of the reference groups, and must match (or mismatch) all references within a group. So, for example, the entry for background-color-049 looks like:
background-color-049	reference/background-color-049-020202-ref;reference/background-color-049-030303-ref

> 
> So I was looking at adding this to web-platform-tests and the current
> design adds some non-trivial complexity. As background,
> web-platform-tests uses a script to auto-generate a manifest file with
> the test files themselves being the only required input. This is
> rather slow, since it involves actually parsing the *ML files and
> inspecting their DOM. Therefore it is important to be able to perform
> incremental updates.

FWIW, we have a build step that scans the entire repository looking for tests, references, and support files, parses all the *ML files, generates manifests, human readable indices, then generates built test suites by re-serializing all *ML input files into HTML, XHTML, and XHTML-Print output files (where applicable). It also adjusts relative paths for reference links so that they remain correct in the built suites. The process currently takes about 6 minutes consuming over 21000 input files and generating 24 test suites. It has not been optimized for speed in any way at this point. Given that it runs daily on a build server, the burden is completely manageable.

> 
> Currently it is always possible to examine a single file and determine
> what type of thing it represents (script test, reftest, manual test,
> helper file, etc.). For example reftests are identified as files with
> a <link rel=[mis]match> element. Since (unlike in CSS) tests in
> general are not required to contain any extra metadata, allowing
> references to link to other references introduces a problem because
> determining whether a file is a reference or a test now requires
> examining the entire chain, not just one file.

I don't understand why you have to parse the entire chain to determine if a single file is a test or a reference, if a file has single reference link, then it's a reftest, regardless of how many other references there may be. You do, of course, have to parse the entire chain to get the list of all references for the manifest, but really, that's not adding a lot of files to be parsed, many tests reuse references, and we use a cache so each file is only parsed once.

For that matter, at least in CSS land, we don't differentiate between tests and references based on the presence of {mis}match links, those only indicate that a test is a reftest. The difference between tests and references is done solely by file and directory naming convention. References are either in a "reference" directory or have a filename that matches any of: "*-ref*", "^ref-*", "*-notref*", "^notref-*". Furthermore, it's perfectly valid to use a _test_ (or a support file, like a PNG on SVG file) as a reference for another test, we have several instances of this in the CSS repo.

> 
> Obviously this isn't impossible to implement. It's just more
> complicated than anything else in the manifest generation, all in
> order to support a rarely-used feature. Are the benefits of the
> approach where the data is distributed across many files really great
> enough, compared to an alternate design where we put all the data
> about the references in the test itself, to justify the extra
> implementation burden? As far as I can tell the main benefit is that
> if two tests share the same reference they get the same full chain of
> references automatically rather than having to copy between files.

Which is valuable in itself, anything that removes a metadata burden from test authors is a win. It also allows for describing complex reference dependencies as well as allowing the alternate reference. Yes, those can be done by alternate approaches but those add complexity (and an opportunity to make mistakes) for the author, as opposed to the build tools.

Also, let me point out again, that the bulk of the code we use for our build process is in a factored out python library[1]. It can use some cleanup, but it contains code that parses all the acceptable input files and extracts (and in some cases allows manipulation of) the metadata. Shepherd also uses this library to do it's metadata extraction and validation checks. If we can converge on using this library (or something that grows out of it) then we don't have to re-write code managing metadata... I'm happy to put in some effort cleaning up and refactoring this library to make it more useful to you.

Peter

[1] http://hg.csswg.org/dev/w3ctestlib/

Received on Wednesday, 3 September 2014 18:11:22 UTC