Re: Comments on the Web Audio API proposal

Hi Ricard,

I'm sorry for the delay in responding.  Some of the issues you brought up we
discussed in Monday's teleconference, but I thought it would be good to
respond on the list as well.

On Mon, Oct 4, 2010 at 7:24 AM, Ricard Marxer Piñón <ricardmp@gmail.com>wrote:

> Hi,
>
> As I said in the last teleconf, I am writing a few comments of the
> current state of the Web Audio API proposal.  More specifically about
> the nodes that I find lacking or that could be merged.
>
> Thoughts about ConvolverNode and BiquadFilterNode
> ------------------------
> First of all I like the fact that there is no more a ReverbNode and
> instead we have a ConvolverNode.  If I understood correctly the
> ConvolverNode is basically an FIR (finite impulse response) filter,
> which is probably implemented internally by frequency domain
> multiplication when the impulse responses are long.
> I also think we should have a the ability of making infinite impulse
> responses.  I know that this is already allowed by having the
> BiquadFilterNode.  However this only allows us to have 3 b and 2 a
> coefficients.
>
> As I see it is that any filter (whether it is a FIR or a Biquad IIR)
> can be defined as a IIR filter, and therefore the API would be much
> more simple if we had only one node for all filters.  A FilterNode
> that under the hood can have specialized implementations for the FIR
> case, the long impulse response FIR case, the biquad case and the
> general case.  For commodity we can have special presets or functions
> in the API to generate the a and b coefficients for certain
> interesting filters (certain reverbs, lowpass, highpass,...).
>

The reason I thought it would be a good idea to separate the ideas of
ConvolverNode and BiquadFilterNode is because they each offer different
levels of ability to dynamically modify the filter characteristics.  In the
ConvolverNode case, it is not generally possible to dynamically modify the
filter coefficients in a smooth way.  I haven't completely described the
BiquadFilterNode in my specification document, but the idea is that it can
be configured as several different common filters such as low-pass, peaking,
notch, allpass, and parametrically controlled with meaningful attributes
such as "cutoff frequency", "filter gain", "Q", and so on.  Then these
parameters can be dynamically changed in time, even on a sample by sample
level.  It would be conceivable to attach an AudioCurve to these parameters
to get high-resolution filter sweeps.  Arbitrary higher-order IIR filters
can then easily be constructed by chaining dozens or possibly even hundreds
of BiquadFilterNodes together with the ability to individually move the
zeroes and poles around.  One example would be a phasor effect with dozens
of BiquadFilters configured as allpass filters, with the frequencies
shifting around.

Although the ConvolverNode doesn't have this dynamic ability, it can much
more efficiently process extremely long impulse responses which have been
measured from real rooms or synthesized.

So, because the differences in the two are significant enough, my feeling is
that it's best to keep them separate.



>
> Thoughts about the RealtimeAnalyzer
> ------------------------
> As I have expressed earlier I think this is quite a vague node that is
> very specific to visualization.  I think a much better node (with a
> more determined behavior) would be an FFTNode.  This node would simply
> perform an FFT (would be also important to allow it to perform the
> IFFT).  And give access to the magnitude and phase (or real and
> imaginary).  This node would be extremely useful not only for
> visualization, but for analysis, synthesis and frequency domain
> effects.
>

If we decide to implement an FFTNode and IFFTNode, then we would also have
to invent several interesting intermediate AudioNodes which process in the
frequency domain.  What would these nodes be?  Also, it could create more
potential in the API for problems where incompatible nodes could be
connected together.  A more general alternative would be to use the
JavaScriptAudioNode and simply allow the JavaScript to perform the FFT,
IFFT, and intermediate processing.  For complex and highly-specific analysis
algorithms this may be the best solution since I'm not sure it would be
possible to invent enough types of frequency-domain AudioNodes to handle all
the cases you're thinking about.

The Mozilla team has demonstrated a variety of basic FFT-based
visualizers where the FFT is done purely in JavaScript.  My API allows for
both native and JS FFT visualizers.  I'm a little concerned that the FFT in
JS approach can result in slightly less smooth graphics (lower frame rates),
and I'm looking to verify if that is the case.



>
>
> Thoughts about the general API
> ------------------------
> One last thing I am worried about is the fact that it should be
> important to allow to use FFT and filter nodes on other things other
> than an audio stream (e.g. on a simple Float32Array that we may have
> in our hands).  The motivation is that in many cases one may not want
> to perform the FFT directly on audio signals.  There are many examples
> of this:
>  - in beat tracking we can use the spectrum analysis (using the
> FFTNode) of an onset detection function
>  - in pitch estimation we may perform the autocorrelation (using the
> FilterNode) of the spectrum
>
> This means that I should be able to simply create and FFTNode or a
> FilterNode and ask it to compute on a given Float32Array that I may
> pass to it, and this should be easy (maybe without the need of a
> context nor an AudioDestinationNode).
>
> Any thoughts?  We can also discuss this in more detail in today's
> teleconf if you wish, sorry for being last minute on this.


We discussed in the teleconference a little bit about the idea of doing
"offline rendering" where a simple or complex graph of AudioNodes is fed an
arbitrary stream of floating-point data and rendering into an AudioBuffer
(which is a set of Float32Arrays, one per channel).

Cheers,
Chris

Received on Thursday, 7 October 2010 19:37:10 UTC