Re: Aiding early implementations of the web audio API from Chris Wilson on 2012-05-22 (public-audio@w3.org from April to June 2012)

From: Chris Wilson <cwilso@google.com>
Date: Tue, 22 May 2012 10:55:54 -0700
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: Marcus Geelnard <mage@opera.com>, public-audio@w3.org, Alistair MacDonald <al@signedon.com>
Message-ID: <CAJK2wqX6m3FHSzuve9vYGEga3yLei_-6Hr09RWFZOezV0iMT5g@mail.gmail.com>
On Tue, May 22, 2012 at 7:18 AM, Jussi Kalliokoski <
jussi.kalliokoski@gmail.com> wrote:

> On Tue, May 22, 2012 at 12:13 PM, Marcus Geelnard <mage@opera.com> wrote:
>
>> Yes, that's true. The filter and convolver nodes are certainly useful for
>> game sound effects (underwater, cave, etc). Other nodes can also be used
>> for creating interesting effects, but that's not the point.
>>
>> My view on the issue is that the more complex the spec is, the more it
>> will cost (spec work, implementation time, test suits, bug fixing,
>> source/binary sizes, etc), and the more likely it is that we will have
>> different behavior in different implementations, ranging from noticeable
>> performance differences to noticeable differences in sound and behavior and
>> possibly implementation-dependent corner case bugs etc.
>>
>> Since all nodes can be implemented in JavaScript (most of them are even
>> trivial), the only reason for using native nodes instead of JavaScript
>> nodes is to improve performance.
>>
>
> I have similar thoughts on this. I'm starting to think we're going to have
> serious spec bloat if we go to lengths defining all the audio building
> blocks that are required to achieve every use case, often resulting only in
> awkward and hacky (no offends, these aren't easy things) solutions just to
> avoid having to use a script node that would mitigate a lot of the benefits
> in having a native graph. I feel that it would be a good idea to strip down
> the spec a bit, making the script nodes more first-class citizens of the
> graph, while reducing the need for effects that are simple to implement
> with custom scripts. I think this would make the standardization effort a
> lot simple as well.
>

I have to disagree with the definition of "trivial," then.  The only node
types I think could really be considered trivial are Gain, Delay and
WaveShaper - every other type is significantly non-trivial to me. And even
then, when you layer on the complexity involved with handling AudioParams
(for the gain on Gain and the delayTime on Delay), and the interpolation
between curve points on WaveShaper, I'm not convinced they're actually
trivial.

The easiest interface would be just be to have an output device stream.
 However, I think having a basic audio toolbox in the form of node types
will cause an explosion of audio applications - building the vocoder
example was illustrative to me, because I ended up using about half of the
node types, and found them to be fantastically easy to build on.  Frankly,
if they hadn't been there, I wouldn't have built the vocoder, because it
would have been too complex for me to take on.  After working through a
number of other scenarios in my mind, I'm left with the same feeling -
having this set of node types fulfills most of the needs that I can
envision, and the few I've thought of that aren't covered, I'm happy to use
JS nodes for.  The only place where I'm personally not entirely convinced
is that I think I would personally trade the DynamicsCompressorNode for an
envelope follower node.  Maybe that's just because I'd rather hack noise
gates, auto-wah effects, etc., without dropping into JS node.

I've already proposed a few things that would make the script nodes more
> first-class in my opinion. [1] [2]
>

I'm in favor of anything necessary to make JavaScript nodes a first-class
citizen.


> Have any performance comparisons been made between the native nodes and
>> their corresponding JavaScript implementations? I'm quite sure that native
>> implementations will be faster (perhaps significantly in several cases),
>> and I can also make some guesses as to which nodes would be actual
>> performance bottle necks, but to what extent?
>>
>
I don't think we've implemented everything twice, once in JavaScript and
once in native code, and optimized their performance, no.  The best
comparison would, I suppose, be any work that Robert did for effects in the
MSP proposal.


> Those things said, I believe having a fast native convolution
> implementation is critical to games and other applications, even with
> native implementations convolution is a very expensive operation. But, that
> said, and bear with me as I've said this before, it would be a good idea to
> expose a function or a class (to keep state for performance optimizations)
> to do convolution, rather than a node. This would be far more generally
> useful, and I believe the browser environment could benefit from this in
> other applications as well, such as image processing. Otherwise this will
> end up redefined elsewhere. For convolution, this is even fairly simple to
> do, because you could make it real-time just by taking advantage of the
> overlap-add, as the current native implementation is, so that at simplest,
> we could have a function that would take the output array, input array and
> an array containing the kernels in frequency domain. Of course, for more
> general use, we'll need to define how dimensions/channels are handled, etc.
>

Hmm.  I understand what you're suggesting, but I'm a little concerned that
only handling tools to developers that say "perform a convolution on an
arbitrary n-dimensional array of data" and hoping they figure out how to
apply it to make reverb, as well as image blurring effects, is not the
right approach.  I don't think everything should be roll it yourself from
the bottom level.

-C
Received on Tuesday, 22 May 2012 17:56:45 UTC