Re: Testing (was: [Agenda] W3C Audio WG Teleconference, 13th June 2012) from Chris Rogers on 2012-06-15 (public-audio@w3.org from April to June 2012)

From: Chris Rogers <crogers@google.com>
Date: Fri, 15 Jun 2012 13:09:12 -0700
To: Marcus Geelnard <mage@opera.com>
Cc: Doug Schepers <schepers@w3.org>, Philip Jägenstedt <philipj@opera.com>, Audio Working Group <public-audio@w3.org>, olivier Thereaux <olivier.thereaux@bbc.co.uk>
Message-ID: <CA+EzO0mHQCm94qOLooapbEqtNxmUBjeERcBb+vCth8=tWBmvBA@mail.gmail.com>
On Fri, Jun 15, 2012 at 7:17 AM, Marcus Geelnard <mage@opera.com> wrote:

> Den 2012-06-13 22:38:27 skrev Chris Rogers <crogers@google.com>:
>
>  On Wed, Jun 13, 2012 at 12:24 PM, Doug Schepers <schepers@w3.org> wrote:
>>
>>  Hi, Philip-
>>>
>>> +1 on your comments and methodology.
>>>
>>> Regards-
>>> -Doug
>>>
>>> On 6/13/12 5:21 AM, Philip Jägenstedt wrote:
>>>
>>>  On Mon, 11 Jun 2012 17:29:22 +0200, olivier Thereaux
>>>> <olivier.thereaux@bbc.co.uk> wrote:
>>>>
>>>>  The call will be held on June 13th at 3PM Boston time. That's noon in
>>>>
>>>>> San Francisco, 3PM in New York, 8PM in London, 9PM in Paris/Oslo and
>>>>> 7am+1D in Auckland.
>>>>>
>>>>>
>>>> Regrets from me. I have one comment on the agenda:
>>>>
>>>>  1) Testing
>>>>
>>>>> Let's start the conversation about the testing effort for the Web
>>>>> Audio API and MIDI API. There are already several initiatives and
>>>>> tests produced, but no coordinated effort yet. Expected outcome: rough
>>>>> agreement on type of test framework, nominate test lead(s).
>>>>>
>>>>>
>>>> I think that we should use the W3C test harness, we use this at Opera
>>>> for all of our new tests and I have no complaints about it:
>>>>
>>>> http://w3c-test.org/resources/****testharness.js<http://w3c-test.org/resources/**testharness.js>
>>>> <http://w3c-**test.org/resources/**testharness.js<http://w3c-test.org/resources/testharness.js>
>>>> >
>>>>
>>>> As for methodology, tests fall roughly into two categories:
>>>>
>>>> 1. Interface tests. This is things like asserting that "new
>>>> AudioContext()" returns an object of the correct type, that it has the
>>>> methods it should have, that calling ctx.createMediaElementSource() with
>>>> no argument throws the appropriate exception, and so on. These tests are
>>>> easy to write and to pass.
>>>>
>>>> 2. Semantic tests, to verify that the audio graph actually does the
>>>> correct thing. In general, I think we should try to implement all native
>>>> nodes in JavaScript and verify that the output is the same within some
>>>> margin of error, a graph like:
>>>>
>>>> +------------+
>>>> |            |
>>>> | Oscillator |--+
>>>> | (native)   |  |   +---------+   +------+
>>>> +------------+  |   |         |   |      |
>>>>               +-->| Compare |-->| Sink |
>>>> +------------+  |   | (JS)    |   |      |
>>>> |            |  |   +---------+   +------+
>>>> | Oscillator |--+
>>>> | (JS)       |
>>>> +------------+
>>>>
>>>> The sink (AudioDestinationNode) is there just to drive the pipeline, the
>>>> compare node would just output silence.
>>>>
>>>> These tests are a lot more work to write, and should of course test
>>>> every imaginable corner case of each node type.
>>>>
>>>>
>>>  Hi Everyone, I'd like to also offer our current layout test suite in
>> WebKit:
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/>
>>
>> We have over sixty tests which come in three varieties:
>>
>> 1. Interface tests as describes in Philip's (1).  Our coverage isn't
>> complete in WebKit, but an example is:
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/**
>> audionode.html<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/audionode.html>
>>
>> 2. Reference tests, which combines AudioNodes in different configurations
>> and renders for a limited time (a few seconds usually), generating a WAV
>> file as a result.  The generated WAV file is compared (bit exact test)
>> with
>> a reference WAV file.  These tests are similar to what we call "pixel"
>> tests in WebKit which we use extensively for CSS, SVG, Canvas, WebGL, etc.
>> which render a page then compare to a reference PNG image file.
>>
>> An example test is:
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/**
>> oscillator-square.html<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/oscillator-square.html>
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/**
>> resources/oscillator-testing.**js<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/resources/oscillator-testing.js>
>>
>> 3. Idealized tests, similar to (2) combines AudioNodes in different
>> configurations and renders for a limited time, internally generating an
>> AudioBuffer as a result.  JavaScript test code then inspects this result
>> and compares it with a version generated internally in JavaScript.  We do
>> allow some tiny deviation to account for floating-point round-off, but
>> otherwise these tests are pretty exact.
>>
>> An example test is:
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/**
>> convolution-mono-mono.html<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/convolution-mono-mono.html>
>> http://svn.webkit.org/**repository/webkit/trunk/**LayoutTests/webaudio/**
>> resources/convolution-testing.**js<http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/resources/convolution-testing.js>
>>
>> The idealized tests (3) represent the majority of our tests because most
>> of
>> the Web Audio API is defined to mathematical precision.  Parts of the API
>> (Oscillator, AudioPannerNode, etc.) are approaching an ideal, but in
>> practice the algorithm used are necessarily an approximation, so we use
>> reference tests (2) for these.  As an analogy to testing methodology for
>> graphics APIs, Canvas 2D can draw lines, circles, etc. which will have
>> slightly different appearance in different browsers due to different
>> anti-aliasing algorithms, etc.  We use reference tests (pixel tests) in
>> these cases.
>>
>>
> Chris, I'm a bit confused (perhaps I'm reading this wrong?).
>
> You say that 2 (Reference tests) is based on bit-exact WAV file
> comparison, which I interpret as: a test will signal "FAILED" if a single
> per-sample difference between the generated WAV file and the reference WAV
> file is found (effectively using the precision of the WAV file, e.g. 16 or
> 24 bits per sample, or even 32-bit float)?
>

Yes, this is correct.  By the way, I'm not sure if my use of the word
"Reference" is correct here.  These tests are the audio analogy to what we
call "pixel" layout tests in WebKit (I'm not sure if that's a
WebKit-specific name or not).  For example, in WebKit we have thousands of
PNG image files which represent our image baselines for CSS layout, SVG,
Canvas 2D, etc.  Different ports in WebKit (chromium, mac/Safari/iOS, GTK,
EFL) *and* different OS versions (win/mac/linux) can have different PNG
image files for the same test (platform-specific image baselines).  So,
while the test itself is bit-exact (will fail if image does not exactly
match).  We can have different baseline images for different
platforms/ports because there can be differences in font-rendering,
color-profiles, image-resizing,  anti-aliasing algorithms, etc.  We have an
essentially identical testing system for audio which I describe as 2
(Reference tests).


>
> That doesn't sound to work with the statement "the algorithm used are
> necessarily an approximation, so we use reference tests (2) for these" (as
> I understand it: to allow for some error compared to the ideal result), so
> I guess I didn't fully understand the comparison operation?
>

As I describe above, the test itself is bit-exact, but (potentially) we
maintain different "baseline" files for different platforms/ports so the
comparisons are made to different files.  I say "potentially" because in
some cases the baselines may not differ and we can share them across
platforms/ports.


>
> I'm also curious about how the reference WAV files were generated. Surely
> they must have been generated by some reference implementation (perhaps in
> some math/signal processing software such as Matlab)? Couldn't that
> reference code just as well have been implemented in JavaScript, so that
> the test could use method 3 (Idealized tests) instead, just with a more
> permissive margin for errors (according to spec)?


No, for those tests which can be tested to a very high-level of precision
we use 3 (Idealized tests) as I describe above.  In these tests, we don't
even bother writing out WAV files and comparing.  Instead, we render the
result internally in the test and compare with expected versions generated
independently in JavaScript.  Luckily, these represent the majority of our
tests.

Chris


>
>
> /Marcus
>
Received on Friday, 15 June 2012 20:09:42 UTC