Re: wptrunner and how to handle ref tests

On Tue, Jul 1, 2014 at 4:33 AM, James Graham <james@hoppipolla.co.uk> wrote:

> On 01/07/14 01:22, Dirk Pranke wrote:
> >
> > On Mon, Jun 30, 2014 at 5:06 PM, Anton Modella Quintana (Plain
> > Concepts Corporation) <v-antonm@microsoft.com
> > <mailto:v-antonm@microsoft.com>> wrote:
> >
> >     Hello public-test-infra,
> >
> >     As Erika said previously [1], Microsoft is working on adding
> >     support to IE to wptrunner and contributing back as much as we
> >     can. While we created our first internal prototype one of the
> >     problems we found were the ref tests. Some of them were failing
> >     just because the antialias on a curve was different depending on
> >     the browser. I don't think those tests should fail.
> >     To mitigate the number of false negatives we tested different
> >     approaches and at the end we decided to use ImageMagick, its
> >     compare tool and a fuzz factor [2]. Basically we compare how
> >     different the two images are and if we get a factor equal or less
> >     than 0.015  then we pass the test. These value is experimental and
> >     it is the best we got after trying different algorithms and
> >     factors. I've attached a few images for you to better see how even
> >     if the images are not exactly equal, the test should pass (at
> >     least in this example).
> >
> >     Some concerns about this approach:
> >     * It has a dependency on ImageMagick (we could implement the
> >     algorithm to remove this dependency if needed)
> >     * There might be some tests where the factor should be tweaked or
> >     even disabled. This number could even change depending on the
> >     browser we are testing
> >
> >     So what does public-test-infra think of this?
> >
> > I believe that I have seen similar sorts of reftest failures in Blink
> > and WebKit over the years as well, though I'm not sure if we have them
> > currently (we probably do).
>
> I know we have similar problems with Mozilla reftests. I think our
> current solution is simply to quantify the maximum number of pixels that
> can be different. I was hoping we could avoid solving this for
> web-platform-tests, but maybe that's over-optimistic. Do you have a list
> of the tests that are giving incorrect results without the use of
> ImageMagick?
>
> > I would be a bit sad to pull in a dependency on ImageMagick given that
> > it is in Perl, but presumably different platforms can do different
> > things as need be.
> >
> That requires us to understand the algorithm, to the level we can
> reimplement it. I'm not sure we currently have that level of
> understanding of what Imagemagick does.
>

I certainly don't, that's true. But, if I was being a standards purist, it
seems like defining fuzzy matching criteria would be a good idea, rather
than leaving it be implementation defined.

That said, I'm not being a standards purist and I'd rather focus on
whatever gets people running more tests more often :).

-- Dirk

Received on Tuesday, 1 July 2014 16:20:15 UTC