Re: Web Audio API questions and comments from Chris Rogers on 2012-06-19 (public-audio@w3.org from April to June 2012)

From: Chris Rogers <crogers@google.com>
Date: Tue, 19 Jun 2012 12:35:05 -0700
To: Joe Turner <joe@oampo.co.uk>
Cc: public-audio@w3.org
Message-ID: <CA+EzO0k8eTO+ZdFhJkNuA-LsvNnUNVEbHRsb9n7M1R2OBcU-tQ@mail.gmail.com>
On Tue, Jun 19, 2012 at 7:39 AM, Joe Turner <joe@oampo.co.uk> wrote:

> Hi all,
> I've had a quick run through the specification again and had a couple
> of questions and observations which I thought might be helpful.  I'm
> afraid it might be covering quite a few areas in one email, so feel
> free to split this out into multiple topics if it's easier to follow.
> Also apologies if I'm going over things which have already been
> discussed; I've tried my best to keep up with the email traffic, but I
> think I've probably missed or forgotten a few things.
>
> Firstly, though a quick thanks to all involved - the spec is looking a
> lot more awesome than when I last had a good look through.
>

Joe, thanks for your comments.  It's good to hear from you again!


>
> - Block size
>
> I know this is a fairly broad architectural issue, but how strong is
> the case for processing blocks of samples rather than single samples?
> The spec states that it's for performance reasons - is the difference
> in performance quantified?  I know that the majority of current
> systems use blocks, but there are notable exceptions (ChucK [1] is the
> most prominent one that I know of) which work with one sample at a
> time.  The practical advantage of single samples is that it allows for
> very short feedback loops.  For an example of this in practice,
> Datorro's figure-of-8 reverb algorithm [2] uses a feedback loop of 107
> samples, which would be impossible without resorting to the
> reimplementation of a number of existing nodes in JavaScript.
>

The performance hit is really very high (can be on the order of 10x slower
or worse).  Because the Web Audio API will be used on quite a broad range
of hardware all the way down to mobile phones, I think it's really
important to pay attention to performance.  You're right that there are
some subset of algorithms which would not be possible.  But we've discussed
the possibility of being able to change the block-size (all the way down to
1) in a later version of the specification.  In the meantime, these very
specialized cases can be handled in a JavaScriptAudioNode.  By the way, in
your example [2], there are other alternatives which can still achieve
native performance like pre-rendering the impulse response into an
AudioBuffer using JavaScript, then loading it into a ConvolverNode.
 There's a pretty rich class of effects you can generate this way.


>
> - AudioNode cycles with JavaScriptAudioNodes
>
> Given that JavaScriptAudioNodes have an inherent delay built in they
> should be fine to use in feedback loops as well as DelayNodes I think.
>  Is this correct?
>

I think it will be ok, although the latency of the JavaScriptAudioNode will
factor into the overall delay.  There would be some limits on very small
delay sizes.  But in many practical cases, this won't be an issue.


>
> - Control rate AudioParams and interpolation
>
> Will there ever be a use for interpolation between values with a
> control rate AudioParam rather than skipping to the next value at the
> start of each block?  The Supercollider documentation [3] mentions
> that this technique is used in some UGens which seems plausible, but
> I'm not clear on when or why this is appropriate.  Does something like
> this need specifying?
>

I'm not sure what you mean exactly.  All the "automation" methods on
AudioParam will generate the parameter values at a-rate which is
high-resolution.


>
> - AudioParam setValueCurveAtTime with an Array parameter
>
> Should this be able to take a normal JavaScript array as a parameter?
> For simple fixed envelopes this method seems simpler than having a
> number of linearRampToValueAtTime calls, but the array will only
> contain a few values so creating a Float32Array seems a bit like
> overkill.
>

Simple traditional ADSR envelopes are very easy to get with just a couple
calls to the automation methods,
setValueAtTime(), linearRampToValueAtTime(), etc.  The code is very concise.


>
> - AudioGainNode dezippering
>
> Can this at least be optional?  If I'm using an AudioGainNode to scale
> an audio node so it can control an AudioParam (for example to act as
> an lfo), then I don't want the output to be filtered in any way.
>

Yes, this is actually already the case.  I haven't yet explained
de-zippering very well in the specification.  But de-zippering only really
applies if .value changes to the AudioParam are being made directly,
instead of via audio-rate signals or via "automation" APIs.  In other
words, if somebody is changing a gain value:

gainNode.gain.value = x;

Then that value will be de-zippered.  But, if you're calling
linearRampToValueAtTime(), or connecting an audio-rate signal to the
parameter, then it will take the exact value from those signals.



> Although a better solution may be:
>
> - Operator nodes, AudioParam nodes
>
> Can we have separate nodes for the basic mathematical operators (add,
> subtract, divide, multiply and modulo), and a way of having the output
> of an AudioParam as a signal?


We already have add, subtract, and multiply.  You can also get many other
transformations by using a WaveShaperNode.  There are probably some
operations which would not be possible, but I think they would be *much*
more specialized.  And in these cases a JavaScriptAudioNode could help out.



>  This would allow all the flexibility
> needed for scaling, offsetting and combining signals in order to
> control parameters.


I think we already have these with the built-in mixing and the
AudioGainNode, etc.


>  I know a bit of trickery can make stuff like this
> possible at the moment, and it's trivial to implement in JavaScript,
> but it seems like core functionality to me.
>

I'm open to suggestions, but think many of the things you've mentioned are
already possible.


>
> - Tapping DelayNodes
>
> At the moment it's only possible to tap a delay line at the beginning,
> and the read position cannot be modulated.


That's not true.  The delay time can be modulated.


>  This makes it pretty much
> impossible to implement effects such as pitch shifting, chorus,
> phasing, flanging and modulated reverbs, all of which rely on either
> multiple taps or modulated taps.  It would be nice to have something
> similar to Supercollider's BufRd [4] and BufWr [5] with DelayNode
> built as a layer on top of this.  Also AudioBufferSourceNode could be
> a layer on top of a BufRd equivalent.
>
> - AudioBufferSourceNode playbackState change event
>
> Would it be useful to have an event fired when the
> AudioBufferSourceNode's state changes?  I can't think of anything off
> the top of my head, but it seems like it could be useful for some
> applications maybe?
>

We've talked about this before, and it still might make sense to add in.
 But developers so far don't seem to be seriously limited by not having
this.


>
> - JavaScriptAudioNode buffer size
>
> Is there a technical limitation as to why the buffer size is limited
> to these values, or would just any power of 2 suffice, maybe with a
> note advising a minimum value?


It's tough to know what the best answer is here.  There are arguments that
developers should just be able to pick a size and deal with any problems
that come up.  Or an alternative is to let the browser/system determine
what the best/optimal buffer size is.  The first case is more flexible, but
could bring out differences depending on the OS, hardware, and the
particular browser which can all vary in the minimum reliable buffer size.


>  Will using Workers make it possible to
> use shorter buffers?


Yes, it should be possible, although garbage collection is still an issue.


>  Also, I know this has been discussed a few times
> - can I add a +1 on allowing JavaScriptAudioNodes access to
> AudioParams.
>

I'm not opposed to it, but we've discussed that we might not add it right
away.


>
> - AudioPannerNode
>
> Having never done any work in 3D sound I find this all a bit
> intimidating.  Is there any chance of something simpler built on top
> of this for those of us who want sound to come out of the left
> speaker, the right speaker, or some combination of the two?
>

You can always use AudioChannelSplitter and AudioChannelMerger to do your
own low-level matrix mixing.


>
> - RealTimeAnalyserNode
>
> This seems strange to me - the functionality could be really useful,
> but it seems focused very narrowly on creating visualisations.  I
> think a nicer solution would be to have separate FFT and IFFT nodes so
> frequency domain effects could be integrated into the processing
> chain, and then a separate node which allows access to the FFT or
> waveform data depending on where in the graph it is inserted.  So for
> visualisations you would have an AudioNode connected to an FFTNode,
> connected to a BufferYoinkerNode.
>

Jussi has also wanted this.  It's a lot more complicated to actually design
the API to work well with the other nodes than you make it sound.  For
example, many/most of the types of processing would involve some kind of
phase-vocoder engine which involves overlapping series of analysis windows,
followed by processing on the frequency-domain frames, followed by an
overlapping series of re-synthesis windows.  You'd need to be able to
control the step sizes, the window types, frequency formats (real/imag or
mag/phase), what happens if you connect frequency nodes to regular nodes
(and vice versa), and then define a library of native frequency-domain
processing nodes, which is not trivial, or even clear what they should be.
 I've worked a lot in this area (SVP phase vocoder tool and AudioSculpt at
IRCAM), and the architecture would need to be quite complex to be of any
value.


>
> A couple more small points on this.  Firstly the name is very
> non-specific - I think it should probably at least describe what
> analysis is being done.  It would also be good to allow windowing
> functions to be supplied, or at least specify which windowing function
> should be used.
>

Yes, I'll have to specify the window.


>
> - DynamicsCompressorNode sidechaining and lookahead
>
> I'm not sure if these are a bit specialised, a bit of a dark art, or
> both, but they are both common and fairly well defined features of
> compressors which may be useful.  I could see sidechaining being
> especially useful for ducking in broadcast applications.
>

Yes, I agree.  We can add a side-chain input here.  But, I'm holding off on
this for now, since it's a little more specialized and I'm trying to
concentrate on the core features first.


>
> - First order filters
>
> It seems a little bit strange that second order (Biquad) filters are
> provided as standard, whereas first order filters aren't.

 Again, this
> is pretty trivial in JavaScript, I was just wondering what the reason
> for their omission is?  First order filters are fairly widely used in
> a number of common audio algorithms, from a basic envelope follower
> through to providing damping in reverbs.
>

Yes, I agree very strongly with you here.  I'd like to get these in here.
 One possibility is to simply add a 1st order low and high pass mode to
BiquadFilterNode.  Even though they're 1st order filters, they can still be
implemented with a biquad (although some of the coefficients will be zero).



>
> Hope some of this is helpful,
> Thanks again,
> Joe
>
>
>
> [1] http://chuck.cs.princeton.edu/
> [2] https://ccrma.stanford.edu/~dattorro/EffectDesignPart1.pdf, p. 662
> [3] http://doc.sccode.org/Tutorials/Getting-Started/11-Busses.html
> [4] http://doc.sccode.org/Classes/BufRd.html
> [5] http://doc.sccode.org/Classes/BufWr.html
>
>
Received on Tuesday, 19 June 2012 19:35:30 UTC