Re: Testing (was: [Agenda] W3C Audio WG Teleconference, 13th June 2012) from Marcus Geelnard on 2012-06-15 (public-audio@w3.org from April to June 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Fri, 15 Jun 2012 16:17:50 +0200
To: "Doug Schepers" <schepers@w3.org>, "Chris Rogers" <crogers@google.com>
Cc: Philip Jägenstedt <philipj@opera.com>, "Audio Working Group" <public-audio@w3.org>, "olivier Thereaux" <olivier.thereaux@bbc.co.uk>
Message-ID: <op.wfx7b0khm77heq@mage-desktop>
Den 2012-06-13 22:38:27 skrev Chris Rogers <crogers@google.com>:

> On Wed, Jun 13, 2012 at 12:24 PM, Doug Schepers <schepers@w3.org> wrote:
>
>> Hi, Philip-
>>
>> +1 on your comments and methodology.
>>
>> Regards-
>> -Doug
>>
>> On 6/13/12 5:21 AM, Philip Jägenstedt wrote:
>>
>>> On Mon, 11 Jun 2012 17:29:22 +0200, olivier Thereaux
>>> <olivier.thereaux@bbc.co.uk> wrote:
>>>
>>>  The call will be held on June 13th at 3PM Boston time. That's noon in
>>>> San Francisco, 3PM in New York, 8PM in London, 9PM in Paris/Oslo and
>>>> 7am+1D in Auckland.
>>>>
>>>
>>> Regrets from me. I have one comment on the agenda:
>>>
>>>  1) Testing
>>>> Let's start the conversation about the testing effort for the Web
>>>> Audio API and MIDI API. There are already several initiatives and
>>>> tests produced, but no coordinated effort yet. Expected outcome: rough
>>>> agreement on type of test framework, nominate test lead(s).
>>>>
>>>
>>> I think that we should use the W3C test harness, we use this at Opera
>>> for all of our new tests and I have no complaints about it:
>>>
>>> http://w3c-test.org/resources/**testharness.js<http://w3c-test.org/resources/testharness.js>
>>>
>>> As for methodology, tests fall roughly into two categories:
>>>
>>> 1. Interface tests. This is things like asserting that "new
>>> AudioContext()" returns an object of the correct type, that it has the
>>> methods it should have, that calling ctx.createMediaElementSource()  
>>> with
>>> no argument throws the appropriate exception, and so on. These tests  
>>> are
>>> easy to write and to pass.
>>>
>>> 2. Semantic tests, to verify that the audio graph actually does the
>>> correct thing. In general, I think we should try to implement all  
>>> native
>>> nodes in JavaScript and verify that the output is the same within some
>>> margin of error, a graph like:
>>>
>>> +------------+
>>> |            |
>>> | Oscillator |--+
>>> | (native)   |  |   +---------+   +------+
>>> +------------+  |   |         |   |      |
>>>                +-->| Compare |-->| Sink |
>>> +------------+  |   | (JS)    |   |      |
>>> |            |  |   +---------+   +------+
>>> | Oscillator |--+
>>> | (JS)       |
>>> +------------+
>>>
>>> The sink (AudioDestinationNode) is there just to drive the pipeline,  
>>> the
>>> compare node would just output silence.
>>>
>>> These tests are a lot more work to write, and should of course test
>>> every imaginable corner case of each node type.
>>>
>>
> Hi Everyone, I'd like to also offer our current layout test suite in  
> WebKit:
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/
>
> We have over sixty tests which come in three varieties:
>
> 1. Interface tests as describes in Philip's (1).  Our coverage isn't
> complete in WebKit, but an example is:
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/audionode.html
>
> 2. Reference tests, which combines AudioNodes in different configurations
> and renders for a limited time (a few seconds usually), generating a WAV
> file as a result.  The generated WAV file is compared (bit exact test)  
> with
> a reference WAV file.  These tests are similar to what we call "pixel"
> tests in WebKit which we use extensively for CSS, SVG, Canvas, WebGL,  
> etc.
> which render a page then compare to a reference PNG image file.
>
> An example test is:
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/oscillator-square.html
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/resources/oscillator-testing.js
>
> 3. Idealized tests, similar to (2) combines AudioNodes in different
> configurations and renders for a limited time, internally generating an
> AudioBuffer as a result.  JavaScript test code then inspects this result
> and compares it with a version generated internally in JavaScript.  We do
> allow some tiny deviation to account for floating-point round-off, but
> otherwise these tests are pretty exact.
>
> An example test is:
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/convolution-mono-mono.html
> http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/resources/convolution-testing.js
>
> The idealized tests (3) represent the majority of our tests because most  
> of
> the Web Audio API is defined to mathematical precision.  Parts of the API
> (Oscillator, AudioPannerNode, etc.) are approaching an ideal, but in
> practice the algorithm used are necessarily an approximation, so we use
> reference tests (2) for these.  As an analogy to testing methodology for
> graphics APIs, Canvas 2D can draw lines, circles, etc. which will have
> slightly different appearance in different browsers due to different
> anti-aliasing algorithms, etc.  We use reference tests (pixel tests) in
> these cases.
>

Chris, I'm a bit confused (perhaps I'm reading this wrong?).

You say that 2 (Reference tests) is based on bit-exact WAV file  
comparison, which I interpret as: a test will signal "FAILED" if a single  
per-sample difference between the generated WAV file and the reference WAV  
file is found (effectively using the precision of the WAV file, e.g. 16 or  
24 bits per sample, or even 32-bit float)?

That doesn't sound to work with the statement "the algorithm used are  
necessarily an approximation, so we use reference tests (2) for these" (as  
I understand it: to allow for some error compared to the ideal result), so  
I guess I didn't fully understand the comparison operation?

I'm also curious about how the reference WAV files were generated. Surely  
they must have been generated by some reference implementation (perhaps in  
some math/signal processing software such as Matlab)? Couldn't that  
reference code just as well have been implemented in JavaScript, so that  
the test could use method 3 (Idealized tests) instead, just with a more  
permissive margin for errors (according to spec)?

/Marcus
Received on Friday, 15 June 2012 14:18:35 UTC