Re: Web Audio API questions and comments from Joe Turner on 2012-06-20 (public-audio@w3.org from April to June 2012)

From: Joe Turner <joe@oampo.co.uk>
Date: Wed, 20 Jun 2012 10:51:38 +0100
To: Chris Rogers <crogers@google.com>
Cc: public-audio@w3.org
Message-ID: <CA+FkkmbzRhQWQ9nDR99Pg1hfj2r3bcfwJ3atddqK2Ak4Xxv_SA@mail.gmail.com>
Hi Chris,
Thanks for the response - I think you've cleared up lots of this for me.

On Tue, Jun 19, 2012 at 8:35 PM, Chris Rogers <crogers@google.com> wrote:
>
>
> On Tue, Jun 19, 2012 at 7:39 AM, Joe Turner <joe@oampo.co.uk> wrote:
>>
>> Hi all,
>> I've had a quick run through the specification again and had a couple
>> of questions and observations which I thought might be helpful.  I'm
>> afraid it might be covering quite a few areas in one email, so feel
>> free to split this out into multiple topics if it's easier to follow.
>> Also apologies if I'm going over things which have already been
>> discussed; I've tried my best to keep up with the email traffic, but I
>> think I've probably missed or forgotten a few things.
>>
>> Firstly, though a quick thanks to all involved - the spec is looking a
>> lot more awesome than when I last had a good look through.
>
>
> Joe, thanks for your comments.  It's good to hear from you again!
>
>>
>>
>> - Block size
>>
>> I know this is a fairly broad architectural issue, but how strong is
>> the case for processing blocks of samples rather than single samples?
>> The spec states that it's for performance reasons - is the difference
>> in performance quantified?  I know that the majority of current
>> systems use blocks, but there are notable exceptions (ChucK [1] is the
>> most prominent one that I know of) which work with one sample at a
>> time.  The practical advantage of single samples is that it allows for
>> very short feedback loops.  For an example of this in practice,
>> Datorro's figure-of-8 reverb algorithm [2] uses a feedback loop of 107
>> samples, which would be impossible without resorting to the
>> reimplementation of a number of existing nodes in JavaScript.
>
>
> The performance hit is really very high (can be on the order of 10x slower
> or worse).  Because the Web Audio API will be used on quite a broad range of
> hardware all the way down to mobile phones, I think it's really important to
> pay attention to performance.  You're right that there are some subset of
> algorithms which would not be possible.  But we've discussed the possibility
> of being able to change the block-size (all the way down to 1) in a later
> version of the specification.  In the meantime, these very specialized cases
> can be handled in a JavaScriptAudioNode.  By the way, in your example [2],
> there are other alternatives which can still achieve native performance like
> pre-rendering the impulse response into an AudioBuffer using JavaScript,
> then loading it into a ConvolverNode.  There's a pretty rich class of
> effects you can generate this way.
>

Fair enough.  This really only matters for a pretty specialised group
of applications, so I can see that performance is a higher priority
for now.

>>
>>
>> - AudioNode cycles with JavaScriptAudioNodes
>>
>> Given that JavaScriptAudioNodes have an inherent delay built in they
>> should be fine to use in feedback loops as well as DelayNodes I think.
>>  Is this correct?
>
>
> I think it will be ok, although the latency of the JavaScriptAudioNode will
> factor into the overall delay.  There would be some limits on very small
> delay sizes.  But in many practical cases, this won't be an issue.
>

Yeah - can this be changed in the specification then so it won't throw
an exception?  I could see this being handy.

>>
>>
>> - Control rate AudioParams and interpolation
>>
>> Will there ever be a use for interpolation between values with a
>> control rate AudioParam rather than skipping to the next value at the
>> start of each block?  The Supercollider documentation [3] mentions
>> that this technique is used in some UGens which seems plausible, but
>> I'm not clear on when or why this is appropriate.  Does something like
>> this need specifying?
>
>
> I'm not sure what you mean exactly.  All the "automation" methods on
> AudioParam will generate the parameter values at a-rate which is
> high-resolution.

Oh, I think I've been an idiot here.  Apologies - ignore this!

>
>>
>>
>> - AudioParam setValueCurveAtTime with an Array parameter
>>
>> Should this be able to take a normal JavaScript array as a parameter?
>> For simple fixed envelopes this method seems simpler than having a
>> number of linearRampToValueAtTime calls, but the array will only
>> contain a few values so creating a Float32Array seems a bit like
>> overkill.
>
>
> Simple traditional ADSR envelopes are very easy to get with just a couple
> calls to the automation methods,
> setValueAtTime(), linearRampToValueAtTime(), etc.  The code is very concise.
>

Okay - this wasn't a big issue, just wondered whether it had been considered.

>>
>>
>> - AudioGainNode dezippering
>>
>> Can this at least be optional?  If I'm using an AudioGainNode to scale
>> an audio node so it can control an AudioParam (for example to act as
>> an lfo), then I don't want the output to be filtered in any way.
>
>
> Yes, this is actually already the case.  I haven't yet explained
> de-zippering very well in the specification.  But de-zippering only really
> applies if .value changes to the AudioParam are being made directly, instead
> of via audio-rate signals or via "automation" APIs.  In other words, if
> somebody is changing a gain value:
>
> gainNode.gain.value = x;
>
> Then that value will be de-zippered.  But, if you're calling
> linearRampToValueAtTime(), or connecting an audio-rate signal to the
> parameter, then it will take the exact value from those signals.
>
>

Ah, okay - this makes sense.  Does setValueAtTime use de-zippering?

>>
>> Although a better solution may be:
>>
>> - Operator nodes, AudioParam nodes
>>
>> Can we have separate nodes for the basic mathematical operators (add,
>> subtract, divide, multiply and modulo), and a way of having the output
>> of an AudioParam as a signal?
>
>
> We already have add, subtract, and multiply.  You can also get many other
> transformations by using a WaveShaperNode.  There are probably some
> operations which would not be possible, but I think they would be *much*
> more specialized.  And in these cases a JavaScriptAudioNode could help out.
>
>
>>
>>  This would allow all the flexibility
>> needed for scaling, offsetting and combining signals in order to
>> control parameters.
>
>
> I think we already have these with the built-in mixing and the
> AudioGainNode, etc.
>
>>
>>  I know a bit of trickery can make stuff like this
>> possible at the moment, and it's trivial to implement in JavaScript,
>> but it seems like core functionality to me.
>
>
> I'm open to suggestions, but think many of the things you've mentioned are
> already possible.

Here's where I tend to disagree a little.  It seems unintuitive to me
to be doing audio maths using the WaveShaperNode.  For example, say I
want to get the reciprocal of a signal.  In order to do this I have
two options.  One is to write a JavaScriptAudioNode - this is trivial,
but now my Synth has four times the latency it had before.  My other
option is to create a Float32Array and fill it with a 1/x curve then
make a WaveShaperNode from this.  This, I would argue, is:
a) Non-trivial - I always get the maths with the indices wrong the
first time when creating lookup tables (although that might just be
me...)
b) Gives a 'worse' result - we are using a lookup table rather than
doing the maths directly
c) Non-obvious - the specification says that the WaveShaperNode is for
creating "non-linear distortion effects", which is not what I'm trying
to do

I can see that a 20 line JavaScript library would sort this out (and
would be the first thing I included in any Web Audio API project), but
making it non-trivial to do maths on the audio stream, create
constants etc. for the sake of reducing the number of nodes by one or
two seems like a strange decision.

>
>>
>>
>> - Tapping DelayNodes
>>
>> At the moment it's only possible to tap a delay line at the beginning,
>> and the read position cannot be modulated.
>
>
> That's not true.  The delay time can be modulated.
>

Oof, looks like I've been an idiot here again.  More apologies.
Off-topic, but does anyone know a good book on audio DSP algorithms
which doesn't need me to work through pages of maths so I can stop
sending ill-informed emails?

>>
>>  This makes it pretty much
>> impossible to implement effects such as pitch shifting, chorus,
>> phasing, flanging and modulated reverbs, all of which rely on either
>> multiple taps or modulated taps.  It would be nice to have something
>> similar to Supercollider's BufRd [4] and BufWr [5] with DelayNode
>> built as a layer on top of this.  Also AudioBufferSourceNode could be
>> a layer on top of a BufRd equivalent.
>>
>> - AudioBufferSourceNode playbackState change event
>>
>> Would it be useful to have an event fired when the
>> AudioBufferSourceNode's state changes?  I can't think of anything off
>> the top of my head, but it seems like it could be useful for some
>> applications maybe?
>
>
> We've talked about this before, and it still might make sense to add in.
>  But developers so far don't seem to be seriously limited by not having
> this.
>

Okay - as I say, I couldn't personally think of any application, I
just seemed like there might be one.

>>
>>
>> - JavaScriptAudioNode buffer size
>>
>> Is there a technical limitation as to why the buffer size is limited
>> to these values, or would just any power of 2 suffice, maybe with a
>> note advising a minimum value?
>
>
> It's tough to know what the best answer is here.  There are arguments that
> developers should just be able to pick a size and deal with any problems
> that come up.  Or an alternative is to let the browser/system determine what
> the best/optimal buffer size is.  The first case is more flexible, but could
> bring out differences depending on the OS, hardware, and the particular
> browser which can all vary in the minimum reliable buffer size.
>
>>
>>  Will using Workers make it possible to
>> use shorter buffers?
>
>
> Yes, it should be possible, although garbage collection is still an issue.
>
>>
>>  Also, I know this has been discussed a few times
>> - can I add a +1 on allowing JavaScriptAudioNodes access to
>> AudioParams.
>
>
> I'm not opposed to it, but we've discussed that we might not add it right
> away.

That all makes sense.  Having the AudioParams in JavaScript is pretty
high up my wish list, but I understand that it's not the biggest
priority.

>
>>
>>
>> - AudioPannerNode
>>
>> Having never done any work in 3D sound I find this all a bit
>> intimidating.  Is there any chance of something simpler built on top
>> of this for those of us who want sound to come out of the left
>> speaker, the right speaker, or some combination of the two?
>
>
> You can always use AudioChannelSplitter and AudioChannelMerger to do your
> own low-level matrix mixing.

Yeah, I didn't think of this.  That makes it simple enough for me.

>
>>
>>
>> - RealTimeAnalyserNode
>>
>> This seems strange to me - the functionality could be really useful,
>> but it seems focused very narrowly on creating visualisations.  I
>> think a nicer solution would be to have separate FFT and IFFT nodes so
>> frequency domain effects could be integrated into the processing
>> chain, and then a separate node which allows access to the FFT or
>> waveform data depending on where in the graph it is inserted.  So for
>> visualisations you would have an AudioNode connected to an FFTNode,
>> connected to a BufferYoinkerNode.
>
>
> Jussi has also wanted this.  It's a lot more complicated to actually design
> the API to work well with the other nodes than you make it sound.  For
> example, many/most of the types of processing would involve some kind of
> phase-vocoder engine which involves overlapping series of analysis windows,
> followed by processing on the frequency-domain frames, followed by an
> overlapping series of re-synthesis windows.  You'd need to be able to
> control the step sizes, the window types, frequency formats (real/imag or
> mag/phase), what happens if you connect frequency nodes to regular nodes
> (and vice versa), and then define a library of native frequency-domain
> processing nodes, which is not trivial, or even clear what they should be.
>  I've worked a lot in this area (SVP phase vocoder tool and AudioSculpt at
> IRCAM), and the architecture would need to be quite complex to be of any
> value.
>

Yes, I understand that this is a pretty big technical challenge and
would require a lot of work just to get it designed.  I wasn't really
imagining a load of native frequency-domain effects, rather that that
would be left as an exercise for the reader to do in JavaScript.  If a
phase-vocoder engine (minus the processing effects) is off the cards
then Marcus' solution would work for me.  In the worst case doing
everything in JavaScript would probably also be fine for what I do.


>>
>>
>> A couple more small points on this.  Firstly the name is very
>> non-specific - I think it should probably at least describe what
>> analysis is being done.  It would also be good to allow windowing
>> functions to be supplied, or at least specify which windowing function
>> should be used.
>
>
> Yes, I'll have to specify the window.
>
>>
>>
>> - DynamicsCompressorNode sidechaining and lookahead
>>
>> I'm not sure if these are a bit specialised, a bit of a dark art, or
>> both, but they are both common and fairly well defined features of
>> compressors which may be useful.  I could see sidechaining being
>> especially useful for ducking in broadcast applications.
>
>
> Yes, I agree.  We can add a side-chain input here.  But, I'm holding off on
> this for now, since it's a little more specialized and I'm trying to
> concentrate on the core features first.
>

Sounds fine to me.

>>
>>
>> - First order filters
>>
>> It seems a little bit strange that second order (Biquad) filters are
>> provided as standard, whereas first order filters aren't.
>>
>>  Again, this
>> is pretty trivial in JavaScript, I was just wondering what the reason
>> for their omission is?  First order filters are fairly widely used in
>> a number of common audio algorithms, from a basic envelope follower
>> through to providing damping in reverbs.
>
>
> Yes, I agree very strongly with you here.  I'd like to get these in here.
>  One possibility is to simply add a 1st order low and high pass mode to
> BiquadFilterNode.  Even though they're 1st order filters, they can still be
> implemented with a biquad (although some of the coefficients will be zero).

Again, this would work for me.

Cheers,
Joe

>> [1] http://chuck.cs.princeton.edu/
>> [2] https://ccrma.stanford.edu/~dattorro/EffectDesignPart1.pdf, p. 662
>> [3] http://doc.sccode.org/Tutorials/Getting-Started/11-Busses.html
>> [4] http://doc.sccode.org/Classes/BufRd.html
>> [5] http://doc.sccode.org/Classes/BufWr.html
>>
>
Received on Wednesday, 20 June 2012 09:52:31 UTC