Re: Testing (was: [Agenda] W3C Audio WG Teleconference, 13th June 2012) from Marcus Geelnard on 2012-06-18 (public-audio@w3.org from April to June 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Mon, 18 Jun 2012 08:49:32 +0200
To: "Chris Rogers" <crogers@google.com>
Cc: "Doug Schepers" <schepers@w3.org>, Philip Jägenstedt <philipj@opera.com>, "Audio Working Group" <public-audio@w3.org>, "olivier Thereaux" <olivier.thereaux@bbc.co.uk>
Message-ID: <op.wf26kuo9m77heq@mage-desktop>

Den 2012-06-15 22:09:12 skrev Chris Rogers <crogers@google.com>:

> On Fri, Jun 15, 2012 at 7:17 AM, Marcus Geelnard <mage@opera.com> wrote:
>
>> Chris, I'm a bit confused (perhaps I'm reading this wrong?).
>>
>> You say that 2 (Reference tests) is based on bit-exact WAV file
>> comparison, which I interpret as: a test will signal "FAILED" if a  
>> single per-sample difference between the generated WAV file and
>> the reference WAV file is found (effectively using the precision
>> of the WAV file, e.g. 16 or 24 bits per sample, or even 32-bit
>> float)?
>>
>
> Yes, this is correct.  By the way, I'm not sure if my use of the word
> "Reference" is correct here.  These tests are the audio analogy to what  
> we call "pixel" layout tests in WebKit (I'm not sure if that's a
> WebKit-specific name or not).  For example, in WebKit we have thousands
> of PNG image files which represent our image baselines for CSS layout,
> SVG, Canvas 2D, etc.  Different ports in WebKit (chromium,
> mac/Safari/iOS, GTK, EFL) *and* different OS versions (win/mac/linux)
> can have different PNG image files for the same test
> (platform-specific image baselines).  So, while the test itself is
> bit-exact (will fail if image does not exactly match).  We can have
> different baseline images for different platforms/ports because there
> can be differences in font-rendering, color-profiles, image-resizing,
> anti-aliasing algorithms, etc.  We have an essentially identical
> testing system for audio which I describe as 2 (Reference tests).
>

Ok, I see. I guess that means that we can't really use that methodology  
for the W3 testing effort, since we would essentially have unique  
reference files for each browser and platform combination, which is kind  
of the opposite of what we want to do (i.e. have common criteria that all  
implementations can be tested against).

>> I'm also curious about how the reference WAV files were generated.  
>> Surely they must have been generated by some reference
>> implementation (perhaps in some math/signal processing software such
>> as Matlab)? Couldn't that reference code just as well have been
>> implemented in JavaScript, so that the test could use method 3
>> (Idealized tests) instead, just with a more permissive margin for
>> errors (according to spec)?
>
> No, for those tests which can be tested to a very high-level of precision
> we use 3 (Idealized tests) as I describe above.  In these tests, we don't
> even bother writing out WAV files and comparing.  Instead, we render the
> result internally in the test and compare with expected versions  
> generated independently in JavaScript.  Luckily, these represent the
> majority of our tests.

What I meant was: We should be able to base all semantic tests on the same  
principle as the Idealized tests, if we just add the concept of tolerance  
(e.g. "accurate to x%" instead of "accurate to 32 bit IEEE 754 floating  
point precision"). After all, that is how the spec has to be written for  
most functions (if we want to allow variations between implementations).  
In other words, the test would just reflect the specification.

/Marcus

Received on Monday, 18 June 2012 06:50:12 UTC