Re: Comments on the Web Audio API proposal

Hi Chris,

Thanks for the response.  I understand better the reasons of your
choices.  See below some open questions or possible alternatives.

On Thu, Oct 7, 2010 at 9:36 PM, Chris Rogers <crogers@google.com> wrote:
> Hi Ricard,
> I'm sorry for the delay in responding.  Some of the issues you brought up we
> discussed in Monday's teleconference, but I thought it would be good to
> respond on the list as well.
>
> On Mon, Oct 4, 2010 at 7:24 AM, Ricard Marxer Piñón <ricardmp@gmail.com>
> wrote:
>>
>> Hi,
>>
>> As I said in the last teleconf, I am writing a few comments of the
>> current state of the Web Audio API proposal.  More specifically about
>> the nodes that I find lacking or that could be merged.
>>
>> Thoughts about ConvolverNode and BiquadFilterNode
>> ------------------------
>> First of all I like the fact that there is no more a ReverbNode and
>> instead we have a ConvolverNode.  If I understood correctly the
>> ConvolverNode is basically an FIR (finite impulse response) filter,
>> which is probably implemented internally by frequency domain
>> multiplication when the impulse responses are long.
>> I also think we should have a the ability of making infinite impulse
>> responses.  I know that this is already allowed by having the
>> BiquadFilterNode.  However this only allows us to have 3 b and 2 a
>> coefficients.
>>
>> As I see it is that any filter (whether it is a FIR or a Biquad IIR)
>> can be defined as a IIR filter, and therefore the API would be much
>> more simple if we had only one node for all filters.  A FilterNode
>> that under the hood can have specialized implementations for the FIR
>> case, the long impulse response FIR case, the biquad case and the
>> general case.  For commodity we can have special presets or functions
>> in the API to generate the a and b coefficients for certain
>> interesting filters (certain reverbs, lowpass, highpass,...).
>
> The reason I thought it would be a good idea to separate the ideas of
> ConvolverNode and BiquadFilterNode is because they each offer different
> levels of ability to dynamically modify the filter characteristics.  In the
> ConvolverNode case, it is not generally possible to dynamically modify the
> filter coefficients in a smooth way.  I haven't completely described the
> BiquadFilterNode in my specification document, but the idea is that it can
> be configured as several different common filters such as low-pass, peaking,
> notch, allpass, and parametrically controlled with meaningful attributes
> such as "cutoff frequency", "filter gain", "Q", and so on.  Then these
> parameters can be dynamically changed in time, even on a sample by sample
> level.  It would be conceivable to attach an AudioCurve to these parameters
> to get high-resolution filter sweeps.  Arbitrary higher-order IIR filters
> can then easily be constructed by chaining dozens or possibly even hundreds
> of BiquadFilterNodes together with the ability to individually move the
> zeroes and poles around.  One example would be a phasor effect with dozens
> of BiquadFilters configured as allpass filters, with the frequencies
> shifting around.
> Although the ConvolverNode doesn't have this dynamic ability, it can much
> more efficiently process extremely long impulse responses which have been
> measured from real rooms or synthesized.
> So, because the differences in the two are significant enough, my feeling is
> that it's best to keep them separate.
>

Ok, I understand the differences between the two.  I guess this will
have to be well explained in the documentation because the names of
the two nodes are not explicit enough, but this is to be done later.

>>
>> Thoughts about the RealtimeAnalyzer
>> ------------------------
>> As I have expressed earlier I think this is quite a vague node that is
>> very specific to visualization.  I think a much better node (with a
>> more determined behavior) would be an FFTNode.  This node would simply
>> perform an FFT (would be also important to allow it to perform the
>> IFFT).  And give access to the magnitude and phase (or real and
>> imaginary).  This node would be extremely useful not only for
>> visualization, but for analysis, synthesis and frequency domain
>> effects.
>
> If we decide to implement an FFTNode and IFFTNode, then we would also have
> to invent several interesting intermediate AudioNodes which process in the
> frequency domain.  What would these nodes be?

I think this is not really necessary.  We could just have a
JavaScriptFFTProcessorNode (ok, not the best name) or something
similar that would take as input the real and imaginary parts of the
spectrum (or magnitude and phase).  And we would just need to connect
it in the following way:

FFTNode -> JavaScriptFFTProcessorNode -> IFFTNode

Then someone can use this processor node to modify or visualize the
FFT using JavaScript.

>  Also, it could create more
> potential in the API for problems where incompatible nodes could be
> connected together.  A more general alternative would be to use the
> JavaScriptAudioNode and simply allow the JavaScript to perform the FFT,
> IFFT, and intermediate processing.  For complex and highly-specific analysis
> algorithms this may be the best solution since I'm not sure it would be
> possible to invent enough types of frequency-domain AudioNodes to handle all
> the cases you're thinking about.

They are a great deal of algorithms that work on the frequency domain.
 Many of them are not specific to analysis or complex.  We have for
example phase vocoder, voice transformations, time scaling or pitch
shifting.  There are some filters that are preferred to be done in the
frequency domain due to it's flexibility.  Also many compression
algorithms for VoIP applications etc. work on the frequency domain.

Having the FFT and IFFT in native code rather than implemented in
JavaScript would simplify greatly the implementation of all these use
cases and allow them to be more performant.

> The Mozilla team has demonstrated a variety of basic FFT-based
> visualizers where the FFT is done purely in JavaScript.  My API allows for
> both native and JS FFT visualizers.  I'm a little concerned that the FFT in
> JS approach can result in slightly less smooth graphics (lower frame rates),
> and I'm looking to verify if that is the case.
>

I think the use case of visualization is quite similar to those that I
explained above in terms of computation requirements, therefore I
think that if the graphics will not be smooth due to JS FFT
implementation the same will be true for other frequency domain
processes.

>>
>> Thoughts about the general API
>> ------------------------
>> One last thing I am worried about is the fact that it should be
>> important to allow to use FFT and filter nodes on other things other
>> than an audio stream (e.g. on a simple Float32Array that we may have
>> in our hands).  The motivation is that in many cases one may not want
>> to perform the FFT directly on audio signals.  There are many examples
>> of this:
>>  - in beat tracking we can use the spectrum analysis (using the
>> FFTNode) of an onset detection function
>>  - in pitch estimation we may perform the autocorrelation (using the
>> FilterNode) of the spectrum
>>
>> This means that I should be able to simply create and FFTNode or a
>> FilterNode and ask it to compute on a given Float32Array that I may
>> pass to it, and this should be easy (maybe without the need of a
>> context nor an AudioDestinationNode).
>>
>> Any thoughts?  We can also discuss this in more detail in today's
>> teleconf if you wish, sorry for being last minute on this.
>
> We discussed in the teleconference a little bit about the idea of doing
> "offline rendering" where a simple or complex graph of AudioNodes is fed an
> arbitrary stream of floating-point data and rendering into an AudioBuffer
> (which is a set of Float32Arrays, one per channel).

This sounds great.  I think it's all that's needed.

> Cheers,
> Chris
>

cheers!

-- 
ricard
http://twitter.com/ricardmp
http://www.ricardmarxer.com
http://www.caligraft.com

Received on Monday, 18 October 2010 11:03:17 UTC