Re: Feedback from Official.fm Labs from Chris Rogers on 2012-02-28 (public-audio@w3.org from January to March 2012)

From: Chris Rogers <crogers@google.com>
Date: Tue, 28 Feb 2012 10:59:15 -0800
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: public-audio@w3.org
Message-ID: <CA+EzO0kqYTYHfUXCbw-yjN3FTR2m9wZMWDRueHNitYkHYAWJSA@mail.gmail.com>
Hi Jussi, thanks for your comments.

On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski <
jussi.kalliokoski@gmail.com> wrote:

> Hey guys,
>
> So we brainstormed a bit on this with my team yesterday about this, so I'm
> sending a summary of feedback as promised.
>
> A little foreword, however. As negative as some points may seem, this is
> exactly what I wanted to get, because we already have heard a lot of good
> things about the APIs, so this is purely getting it all out there. So don't
> be fooled by the tone, we're really excited about both of the current
> proposals.
>
>  * One point that stood up is that while graph-based APIs are easily
> approachable to people who are not familiar with DSP, if you're doing any
> complex DSP and/or need to control the flow of your program, you'll end up
> working around the limitations of the API and eventually implementing the
> effects yourself. A few cases to demonstrate this point:
>

I think we're certainly in agreement that some people want to write
specialized DSP code which isn't available as built-in nodes.  The
JavaScriptAudioNode allows this type of custom code.  I disagree slightly
with the wording and strength of your statement:

 "if you're doing any complex DSP and/or need to control the flow of your
program, you'll end up working around the limitations of the API and
eventually implementing the effects yourself"

I believe that the Web Audio API offers good potential for implementing
complex DSP and that many applications will not need to (or want to) go
down to hand-coded JavaScript DSP.  So I think your assertion is a bit
overstated.  But, the most important point is the need for custom
JavaScript processing which we both agree is a good tool.



>    - The one presented in the mail earlier today, where you have a game
> that has timed events scheduled, and then you go to the menu, and the menu
> has it's own sounds. This means you'll have to either create multiple
> graphs (which seems to be currently restricted in the Chrome implementation
> of Web Audio API to a limited number) or handle the flow yourself (in a
> buffer-based processing API you could control this kind of a use case quite
> simply).
>

I'm not quite sure what limitation you're highlighting here.  It's quite
possible to have many sub-graphs doing completely independent processing
all within a single AudioContext.


>    - Let's say we have a delay effect with a filter in its feedback loop:
>      Input -----> Delay -----> Output
>               ^ <-- Filter <--^
>      Again, simple to achieve in a buffer based API, but not in a
> graph-based one.
>

Really?  There are examples of delays with effects in the feedback loop
like the "WaveTable synth demo"  It wasn't too hard to achieve.  I added
controls for the dry/wet mix, the feedback, and the BPM-synchronized delay
time in a very straight-forward way.



>    - You need to get the data through a filter, then get the FFT data for
> that. You'll have to go through a serious amount of boilerplate to get
> where you want, whereas in a buffer based API, it might have just looked
> like this: fft(filter(data, parameters)), and you would get it
> synchronously, where as with Web Audio API for example, you have to do it
> asynchronously.
>

I'm not quite sure what you mean.  I'm sure that for some specific custom
effects it would be easier to do this directly in JavaScript.  But you can
certainly have delays with convolution reverb in the feedback (which
happens to use FFTs in the internal implementation).



>    - Time stretching is completely impossible achieve in a graph-based
> API, without a memory overflow in a blink of the eye, because you're not in
> control of the flow.
>

I'm pretty sure that it's not impossible.  After all there's a crude
time-stretching demo here:
http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html

The audio fidelity is not very high, but the example code can be tweaked
for improvements there which are suitable for voice time-stretching, which
is one of its biggest uses.


>    Anyway, the common opinion seemed to be that graph-based API should be
> a higher level abstraction, not the basis of all functionality.
>

It *is* a higher-level abstraction in the Web Audio API with one of the
nodes being available for direct JavaScript processing.


>  * Another thing that sort of relates to the previous point is that it
> would be highly useful to have native functions for high volume funtions
> that are expensive in JS. One example would be common functionality in
> decoders, such as clz (count leading zeroes). Also exposing native native
> decoders would be useful, but this is already done in both APIs to some
> extent (reading data from <audio> and <video> is possible). Another
> relation to the previous point is that instead of graph-based effects, you
> could control the flow yourself if we'd offer a sort of a standards library
> for most common expensive DSP functionality. This library could also
> include native encoders.
>
>
I think this notion of a library of common functions is exactly what the
built-in nodes of the Web Audio API represent.



> Web Audio API specific
>  * The number of simultaneous AudioContexts seems to be limited.
>

There will always be a hard limit, but it can be fairly high.  Nearly all
use cases will only require a single context, however.


>  * It's odd that common processing paradigms are handled natively, yet
> Sample Rate Conversion, which is a relatively expensive operation, is not.


I might not disagree with that point.



> The spec says that the third argument for the audio context is sample
> rate, but the current implementation doesn't obey the sample rate you
> choose there.


I'm not clear where you're seeing this in the specification document.


> However, given the setup cost for an AudioContext and the limited number
> of them, it would be far more efficient if you could specify the sample
> rate for individual JavaScriptProcessingNodes, since in what we're often
> handling varying sample rate and channel count sources. It should also be
> possible to change the sample rate on the fly.
>

It's complex both conceptually for the developer and for the implementation
to manage many nodes all which are running at different sample-rates.


>  * In the current implementation, there's no way to kill an AudioContext.
>

It should be simple enough to add a "stopAllSound()" or "teardownGraph()"
method if developers find it useful.  It hasn't been seen to be a
limitation by anybody so far.


>
> MediaStreams Processing Specific
>  * No main thread processing. May be a good thing however, because it's a
> good practice, but forcing good practices are usually a bad idea.
>
> Not necessarily in the scope of the Audio WG, but I'll still list them
> here:
>  * The ability to probe what sort of an audio device we're outputting to,
> and changes therefore (for example, are these internal speakers, earbuds,
> stage monitors or a basic 5.1 home theatre setup, and when you actually
> plug in the earbuds).
>  * The same for input devices. These would allow you to automatically do
> mixing, equalization and compression options for different setups.
>

Yes, I agree we'll need some way to query the capabilities and choose the
available devices.  This will be exciting!


>
> There might have been some other points as well, but I can't remember
> right now. Hope this was helpful!
>
> Cheers,
> Jussi Kalliokoski


Jussi, thanks for your time in writing this up...

Chris
Received on Tuesday, 28 February 2012 18:59:49 UTC