Re: wptrunner and how to handle ref tests from Dirk Pranke on 2014-07-01 (public-test-infra@w3.org from July to September 2014)

From: Dirk Pranke <dpranke@chromium.org>
Date: Mon, 30 Jun 2014 17:22:49 -0700
To: "Anton Modella Quintana (Plain Concepts Corporation)" <v-antonm@microsoft.com>
Cc: "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <CAEoffTCr21RNYGah6ZC5vkj3=Tp276ar+Q6eUB9oAcVvvkSysw@mail.gmail.com>

On Mon, Jun 30, 2014 at 5:06 PM, Anton Modella Quintana (Plain Concepts
Corporation) <v-antonm@microsoft.com> wrote:

> Hello public-test-infra,
>
> As Erika said previously [1], Microsoft is working on adding support to IE
> to wptrunner and contributing back as much as we can. While we created our
> first internal prototype one of the problems we found were the ref tests.
> Some of them were failing just because the antialias on a curve was
> different depending on the browser. I don't think those tests should fail.
> To mitigate the number of false negatives we tested different approaches
> and at the end we decided to use ImageMagick, its compare tool and a fuzz
> factor [2]. Basically we compare how different the two images are and if we
> get a factor equal or less than 0.015  then we pass the test. These value
> is experimental and it is the best we got after trying different algorithms
> and factors. I've attached a few images for you to better see how even if
> the images are not exactly equal, the test should pass (at least in this
> example).
>
> Some concerns about this approach:
> * It has a dependency on ImageMagick (we could implement the algorithm to
> remove this dependency if needed)
> * There might be some tests where the factor should be tweaked or even
> disabled. This number could even change depending on the browser we are
> testing
>
> So what does public-test-infra think of this?
>
>
I believe that I have seen similar sorts of reftest failures in Blink and
WebKit over the years as well, though I'm not sure if we have them
currently (we probably do).

Blink and WebKit have custom C++ executables to compute the image diffs,
and WebKit also has the ability to do fuzzy matching in a way similar to
what you describe.

So, I don't think the idea is too off-base.

It would be interesting to try and identify what sorts of things we're
doing that cause these diffs to occur. Perhaps there are ways we can write
reftests that are more reliable?

I would be a bit sad to pull in a dependency on ImageMagick given that it
is in Perl, but presumably different platforms can do different things as
need be.

It looks like you can, with some amount of work, get similar functionality
out of python using scipy and/or other libraries, e.g.:

http://stackoverflow.com/questions/189943/how-can-i-quantify-difference-between-two-images
http://stackoverflow.com/questions/2603713/comparing-similar-images-with-python-pil

-- Dirk

Received on Tuesday, 1 July 2014 00:23:36 UTC