Re: Mismatch between CSS and web-platform-tests semantics for reftests from Dirk Pranke on 2014-09-03 (public-test-infra@w3.org from July to September 2014)

From: Dirk Pranke <dpranke@chromium.org>
Date: Wed, 3 Sep 2014 13:11:25 -0700
To: Peter Linss <peter.linss@hp.com>
Cc: James Graham <james@hoppipolla.co.uk>, public-test-infra <public-test-infra@w3.org>
Message-ID: <CAEoffTDUyv7eZOKH3YySeGtebvz9QQTMXazREs=QUkJQJzPf3g@mail.gmail.com>
On Wed, Sep 3, 2014 at 11:10 AM, Peter Linss <peter.linss@hp.com> wrote:

>
> On Sep 3, 2014, at 6:41 AM, James Graham <james@hoppipolla.co.uk> wrote:
>
> > On 20/08/14 01:22, Peter Linss wrote:
> >
> >>> Are these features something that any actual implementation is
> >>> running? As far as I can tell from the documentation, Mozilla
> >>> reftests don't support this feature, and I guess from Dirke's
> >>> response that Blink/WebKit reftests don't either. That doesn't
> >>> cover all possible implementations of course.
> >>
> >> I'm actually in the middle of a big cleanup of our test harness and
> >> it will support this feature when I'm done (so far we haven't been
> >> able to represent the situation in our manifest files properly, I'm
> >> fixing that too).
>
> This is now online. In our manifest files, we now list "reference groups"
> separated by semicolons, within each group, references are separated by
> commas. A test must match any of the reference groups, and must match (or
> mismatch) all references within a group. So, for example, the entry for
> background-color-049 looks like:
> background-color-049
> reference/background-color-049-020202-ref;reference/background-color-049-030303-ref
>

Peter, I never did see any answer from you to my questions earlier in the
thread, and this reply uses the same potentially confusing wording.

To confirm: when you say "must match all references in a group", you're
really saying that the references themselves might also be tests, right?
i.e., you can do pairwise testing and get coverage transitively, right?

I can't think of a reason that in order to see if color-049 rendered
correctly you would need to check -020202 *and* 030303 against 049, as long
as you compared 020202 against 030303?

Does that make sense?


> >
> > So I was looking at adding this to web-platform-tests and the current
> > design adds some non-trivial complexity. As background,
> > web-platform-tests uses a script to auto-generate a manifest file with
> > the test files themselves being the only required input. This is
> > rather slow, since it involves actually parsing the *ML files and
> > inspecting their DOM. Therefore it is important to be able to perform
> > incremental updates.
>
> FWIW, we have a build step that scans the entire repository looking for
> tests, references, and support files, parses all the *ML files, generates
> manifests, human readable indices, then generates built test suites by
> re-serializing all *ML input files into HTML, XHTML, and XHTML-Print output
> files (where applicable). It also adjusts relative paths for reference
> links so that they remain correct in the built suites. The process
> currently takes about 6 minutes consuming over 21000 input files and
> generating 24 test suites. It has not been optimized for speed in any way
> at this point. Given that it runs daily on a build server, the burden is
> completely manageable.
>

Somewhat off-topic, but what is this system you describe in the "we have a
build step"? It sounds like this isn't shepherd, but something else? Do you
have something running tests against (some set of) browsers?

-- Dirk


> >
> > Currently it is always possible to examine a single file and determine
> > what type of thing it represents (script test, reftest, manual test,
> > helper file, etc.). For example reftests are identified as files with
> > a <link rel=[mis]match> element. Since (unlike in CSS) tests in
> > general are not required to contain any extra metadata, allowing
> > references to link to other references introduces a problem because
> > determining whether a file is a reference or a test now requires
> > examining the entire chain, not just one file.
>
> I don't understand why you have to parse the entire chain to determine if
> a single file is a test or a reference, if a file has single reference
> link, then it's a reftest, regardless of how many other references there
> may be. You do, of course, have to parse the entire chain to get the list
> of all references for the manifest, but really, that's not adding a lot of
> files to be parsed, many tests reuse references, and we use a cache so each
> file is only parsed once.
>
> For that matter, at least in CSS land, we don't differentiate between
> tests and references based on the presence of {mis}match links, those only
> indicate that a test is a reftest. The difference between tests and
> references is done solely by file and directory naming convention.
> References are either in a "reference" directory or have a filename that
> matches any of: "*-ref*", "^ref-*", "*-notref*", "^notref-*". Furthermore,
> it's perfectly valid to use a _test_ (or a support file, like a PNG on SVG
> file) as a reference for another test, we have several instances of this in
> the CSS repo.
>
> >
> > Obviously this isn't impossible to implement. It's just more
> > complicated than anything else in the manifest generation, all in
> > order to support a rarely-used feature. Are the benefits of the
> > approach where the data is distributed across many files really great
> > enough, compared to an alternate design where we put all the data
> > about the references in the test itself, to justify the extra
> > implementation burden? As far as I can tell the main benefit is that
> > if two tests share the same reference they get the same full chain of
> > references automatically rather than having to copy between files.
>
> Which is valuable in itself, anything that removes a metadata burden from
> test authors is a win. It also allows for describing complex reference
> dependencies as well as allowing the alternate reference. Yes, those can be
> done by alternate approaches but those add complexity (and an opportunity
> to make mistakes) for the author, as opposed to the build tools.
>
> Also, let me point out again, that the bulk of the code we use for our
> build process is in a factored out python library[1]. It can use some
> cleanup, but it contains code that parses all the acceptable input files
> and extracts (and in some cases allows manipulation of) the metadata.
> Shepherd also uses this library to do it's metadata extraction and
> validation checks. If we can converge on using this library (or something
> that grows out of it) then we don't have to re-write code managing
> metadata... I'm happy to put in some effort cleaning up and refactoring
> this library to make it more useful to you.
>
> Peter
>
> [1] http://hg.csswg.org/dev/w3ctestlib/
>
>
Received on Wednesday, 3 September 2014 20:12:12 UTC