Feedback from Official.fm Labs from Jussi Kalliokoski on 2012-02-28 (public-audio@w3.org from January to March 2012)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Tue, 28 Feb 2012 17:18:00 +0200
To: public-audio@w3.org
Message-ID: <CAJhzemUQC-OCyVw4QHmN-z3=EmQWrMo7NL9DtMepsyn0Fc7R7Q@mail.gmail.com>
Hey guys,

So we brainstormed a bit on this with my team yesterday about this, so I'm
sending a summary of feedback as promised.

A little foreword, however. As negative as some points may seem, this is
exactly what I wanted to get, because we already have heard a lot of good
things about the APIs, so this is purely getting it all out there. So don't
be fooled by the tone, we're really excited about both of the current
proposals.

 * One point that stood up is that while graph-based APIs are easily
approachable to people who are not familiar with DSP, if you're doing any
complex DSP and/or need to control the flow of your program, you'll end up
working around the limitations of the API and eventually implementing the
effects yourself. A few cases to demonstrate this point:
   - The one presented in the mail earlier today, where you have a game
that has timed events scheduled, and then you go to the menu, and the menu
has it's own sounds. This means you'll have to either create multiple
graphs (which seems to be currently restricted in the Chrome implementation
of Web Audio API to a limited number) or handle the flow yourself (in a
buffer-based processing API you could control this kind of a use case quite
simply).
   - Let's say we have a delay effect with a filter in its feedback loop:
     Input -----> Delay -----> Output
              ^ <-- Filter <--^
     Again, simple to achieve in a buffer based API, but not in a
graph-based one.
   - You need to get the data through a filter, then get the FFT data for
that. You'll have to go through a serious amount of boilerplate to get
where you want, whereas in a buffer based API, it might have just looked
like this: fft(filter(data, parameters)), and you would get it
synchronously, where as with Web Audio API for example, you have to do it
asynchronously.
   - Time stretching is completely impossible achieve in a graph-based API,
without a memory overflow in a blink of the eye, because you're not in
control of the flow.
   Anyway, the common opinion seemed to be that graph-based API should be a
higher level abstraction, not the basis of all functionality.
 * Another thing that sort of relates to the previous point is that it
would be highly useful to have native functions for high volume funtions
that are expensive in JS. One example would be common functionality in
decoders, such as clz (count leading zeroes). Also exposing native native
decoders would be useful, but this is already done in both APIs to some
extent (reading data from <audio> and <video> is possible). Another
relation to the previous point is that instead of graph-based effects, you
could control the flow yourself if we'd offer a sort of a standards library
for most common expensive DSP functionality. This library could also
include native encoders.

Web Audio API specific
 * The number of simultaneous AudioContexts seems to be limited.
 * It's odd that common processing paradigms are handled natively, yet
Sample Rate Conversion, which is a relatively expensive operation, is not.
The spec says that the third argument for the audio context is sample rate,
but the current implementation doesn't obey the sample rate you choose
there. However, given the setup cost for an AudioContext and the limited
number of them, it would be far more efficient if you could specify the
sample rate for individual JavaScriptProcessingNodes, since in what we're
often handling varying sample rate and channel count sources. It should
also be possible to change the sample rate on the fly.
 * In the current implementation, there's no way to kill an AudioContext.

MediaStreams Processing Specific
 * No main thread processing. May be a good thing however, because it's a
good practice, but forcing good practices are usually a bad idea.

Not necessarily in the scope of the Audio WG, but I'll still list them here:
 * The ability to probe what sort of an audio device we're outputting to,
and changes therefore (for example, are these internal speakers, earbuds,
stage monitors or a basic 5.1 home theatre setup, and when you actually
plug in the earbuds).
 * The same for input devices. These would allow you to automatically do
mixing, equalization and compression options for different setups.

There might have been some other points as well, but I can't remember right
now. Hope this was helpful!

Cheers,
Jussi Kalliokoski
Received on Tuesday, 28 February 2012 15:18:31 UTC