- From: James Graham <james@hoppipolla.co.uk>
- Date: Tue, 01 Jul 2014 12:33:03 +0100
- To: public-test-infra@w3.org
On 01/07/14 01:22, Dirk Pranke wrote: > > On Mon, Jun 30, 2014 at 5:06 PM, Anton Modella Quintana (Plain > Concepts Corporation) <v-antonm@microsoft.com > <mailto:v-antonm@microsoft.com>> wrote: > > Hello public-test-infra, > > As Erika said previously [1], Microsoft is working on adding > support to IE to wptrunner and contributing back as much as we > can. While we created our first internal prototype one of the > problems we found were the ref tests. Some of them were failing > just because the antialias on a curve was different depending on > the browser. I don't think those tests should fail. > To mitigate the number of false negatives we tested different > approaches and at the end we decided to use ImageMagick, its > compare tool and a fuzz factor [2]. Basically we compare how > different the two images are and if we get a factor equal or less > than 0.015 then we pass the test. These value is experimental and > it is the best we got after trying different algorithms and > factors. I've attached a few images for you to better see how even > if the images are not exactly equal, the test should pass (at > least in this example). > > Some concerns about this approach: > * It has a dependency on ImageMagick (we could implement the > algorithm to remove this dependency if needed) > * There might be some tests where the factor should be tweaked or > even disabled. This number could even change depending on the > browser we are testing > > So what does public-test-infra think of this? > > I believe that I have seen similar sorts of reftest failures in Blink > and WebKit over the years as well, though I'm not sure if we have them > currently (we probably do). I know we have similar problems with Mozilla reftests. I think our current solution is simply to quantify the maximum number of pixels that can be different. I was hoping we could avoid solving this for web-platform-tests, but maybe that's over-optimistic. Do you have a list of the tests that are giving incorrect results without the use of ImageMagick? > I would be a bit sad to pull in a dependency on ImageMagick given that > it is in Perl, but presumably different platforms can do different > things as need be. > That requires us to understand the algorithm, to the level we can reimplement it. I'm not sure we currently have that level of understanding of what Imagemagick does. I also have principled worries about this approach. For example, it's pretty clear that most tests showing a green square shouldn't show *any* differences; a single red pixel might be a fail. Similarly, but more worryingly, we could have a test where a small difference in one area would be fine (e.g. due to antialiasing differences on a curve), but a small difference in another area would be a problem. I suppose the sophisticated solution is to require that where we allow any difference at all, we check in a mask image that identifies the pixels that are allowed to differ (and, potentially, how much they are allowed to differ by). I'm not sure if this is too much effort for the amount of benefit, however.
Received on Tuesday, 1 July 2014 11:33:32 UTC