Re: Aiding early implementations of the web audio API from Jussi Kalliokoski on 2012-05-22 (public-audio@w3.org from April to June 2012)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Tue, 22 May 2012 17:18:04 +0300
To: Marcus Geelnard <mage@opera.com>
Cc: Chris Wilson <cwilso@google.com>, public-audio@w3.org, Alistair MacDonald <al@signedon.com>
Message-ID: <CAJhzemUPPMveEtjz5qkn9LWLxFwCfCHHtbtg0kw2ky4hf9OLtA@mail.gmail.com>
On Tue, May 22, 2012 at 12:13 PM, Marcus Geelnard <mage@opera.com> wrote:

> Yes, that's true. The filter and convolver nodes are certainly useful for
> game sound effects (underwater, cave, etc). Other nodes can also be used
> for creating interesting effects, but that's not the point.
>
> My view on the issue is that the more complex the spec is, the more it
> will cost (spec work, implementation time, test suits, bug fixing,
> source/binary sizes, etc), and the more likely it is that we will have
> different behavior in different implementations, ranging from noticeable
> performance differences to noticeable differences in sound and behavior and
> possibly implementation-dependent corner case bugs etc.
>
> Since all nodes can be implemented in JavaScript (most of them are even
> trivial), the only reason for using native nodes instead of JavaScript
> nodes is to improve performance.
>

I have similar thoughts on this. I'm starting to think we're going to have
serious spec bloat if we go to lengths defining all the audio building
blocks that are required to achieve every use case, often resulting only in
awkward and hacky (no offends, these aren't easy things) solutions just to
avoid having to use a script node that would mitigate a lot of the benefits
in having a native graph. I feel that it would be a good idea to strip down
the spec a bit, making the script nodes more first-class citizens of the
graph, while reducing the need for effects that are simple to implement
with custom scripts. I think this would make the standardization effort a
lot simple as well.

I've already proposed a few things that would make the script nodes more
first-class in my opinion. [1] [2]


> Have any performance comparisons been made between the native nodes and
> their corresponding JavaScript implementations? I'm quite sure that native
> implementations will be faster (perhaps significantly in several cases),
> and I can also make some guesses as to which nodes would be actual
> performance bottle necks, but to what extent?
>
> I guess what I'm getting at is: What is the minimal subset of the API that
> we *need* to be able to support the majority of use cases (in terms of the
> actual number of end users)? And on top of this, what nodes do we *have* to
> implement natively in order to get acceptable performance in 3D games, for
> example.
>
> I also think it's worth considering exposing the most critical processing
> primitives (e.g. FFT) as JavaScript functions rather than AudioNodes. That
> would bring the performance gap down even more, and open up for even more
> interesting possibilities.
>

I proposed having a built-in FFT module in ES [3], but my guess is it won't
come around in a while. That doesn't stop us from defining such a module,
however. But FFT is a relatively simple context and can be optimized to
great extents, even in JS, in fact we (Official.fm Labs) released fft.js
[4] (also bundled in and used by audiolib.js), it currently offers
arbitrary-sized FFT (butterflies for kernels 2, 3, 4 and 5, and more coming
soon, also DFT for arbitrary prime numbers), real->complex transform, etc.
The speed is relatively good, we're not talking FFTW fast, that's very
close to impossible, but still a *lot* faster than a lot of native FFT
implementations out there, even in widespread use (we're going to release a
benchmark suite some time in the future).

In my opinion, that goes to demonstrate a lot of the things we're defining
could be made in JS, possibly not losing that much in performance to having
native implementations. That said, the exception is the convolution. If
I've understood correctly, Chris' current implementation takes advantage of
how little state convolution requires, using the overlap-add method for
large kernels, processing batches of samples in different threads for a
very significant speed-up JS as it is now can't possibly compete with. At
Official.fm Labs, we're actively trying to make JS a better environment for
DSP, for example Jens Nockert is working on a Typed Array extension that
would give JS access to the power of SIMD instructions. But for
concurrency, there's only so much we can do, I've tried experimenting with
concurrency using Web Workers in form of implementing naive fragment
shaders in JS [5], but the latency of data transfer and the cost of not
having shared state is so big that the less threads you have, the better.

Those things said, I believe having a fast native convolution
implementation is critical to games and other applications, even with
native implementations convolution is a very expensive operation. But, that
said, and bear with me as I've said this before, it would be a good idea to
expose a function or a class (to keep state for performance optimizations)
to do convolution, rather than a node. This would be far more generally
useful, and I believe the browser environment could benefit from this in
other applications as well, such as image processing. Otherwise this will
end up redefined elsewhere. For convolution, this is even fairly simple to
do, because you could make it real-time just by taking advantage of the
overlap-add, as the current native implementation is, so that at simplest,
we could have a function that would take the output array, input array and
an array containing the kernels in frequency domain. Of course, for more
general use, we'll need to define how dimensions/channels are handled, etc.

Now, all of the sudden we wouldn't need a single native node, but rather
have helper functions to take advantage of the processing power processors
have these days, helper functions that would span their usefulness in
applications other than audio as well. Not to mention, that it would
simplify the specification significantly, make it easier to implement,
easier to integrate with other parts of the web stack, it wouldn't force
the framework that comes with the Web Audio API as it is upon the
developers, but rather give more space and power for the choice of
processing frameworks.

Cheers,
Jussi

[1] http://www.w3.org/2011/audio/track/issues/4
[2] http://www.w3.org/2011/audio/track/issues/6
[3] https://mail.mozilla.org/pipermail/es-discuss/2012-May/022780.html
[4] https://github.com/JensNockert/fft.js
[5] https://gist.github.com/2689799
Received on Tuesday, 22 May 2012 14:19:00 UTC