- From: Dirk Pranke <dpranke@chromium.org>
- Date: Mon, 30 Jun 2014 17:22:49 -0700
- To: "Anton Modella Quintana (Plain Concepts Corporation)" <v-antonm@microsoft.com>
- Cc: "public-test-infra@w3.org" <public-test-infra@w3.org>
- Message-ID: <CAEoffTCr21RNYGah6ZC5vkj3=Tp276ar+Q6eUB9oAcVvvkSysw@mail.gmail.com>
On Mon, Jun 30, 2014 at 5:06 PM, Anton Modella Quintana (Plain Concepts Corporation) <v-antonm@microsoft.com> wrote: > Hello public-test-infra, > > As Erika said previously [1], Microsoft is working on adding support to IE > to wptrunner and contributing back as much as we can. While we created our > first internal prototype one of the problems we found were the ref tests. > Some of them were failing just because the antialias on a curve was > different depending on the browser. I don't think those tests should fail. > To mitigate the number of false negatives we tested different approaches > and at the end we decided to use ImageMagick, its compare tool and a fuzz > factor [2]. Basically we compare how different the two images are and if we > get a factor equal or less than 0.015 then we pass the test. These value > is experimental and it is the best we got after trying different algorithms > and factors. I've attached a few images for you to better see how even if > the images are not exactly equal, the test should pass (at least in this > example). > > Some concerns about this approach: > * It has a dependency on ImageMagick (we could implement the algorithm to > remove this dependency if needed) > * There might be some tests where the factor should be tweaked or even > disabled. This number could even change depending on the browser we are > testing > > So what does public-test-infra think of this? > > I believe that I have seen similar sorts of reftest failures in Blink and WebKit over the years as well, though I'm not sure if we have them currently (we probably do). Blink and WebKit have custom C++ executables to compute the image diffs, and WebKit also has the ability to do fuzzy matching in a way similar to what you describe. So, I don't think the idea is too off-base. It would be interesting to try and identify what sorts of things we're doing that cause these diffs to occur. Perhaps there are ways we can write reftests that are more reliable? I would be a bit sad to pull in a dependency on ImageMagick given that it is in Perl, but presumably different platforms can do different things as need be. It looks like you can, with some amount of work, get similar functionality out of python using scipy and/or other libraries, e.g.: http://stackoverflow.com/questions/189943/how-can-i-quantify-difference-between-two-images http://stackoverflow.com/questions/2603713/comparing-similar-images-with-python-pil -- Dirk
Received on Tuesday, 1 July 2014 00:23:36 UTC