Re: Feedback from Official.fm Labs

On Tue, Feb 28, 2012 at 11:56 AM, Jussi Kalliokoski <
jussi.kalliokoski@gmail.com> wrote:

> Hey Chris,
>
> As I said in the beginning, don't be fooled by the tone, I just tried to
> capture the gist of the conversation we had last night with the team. Some
> of the points are overstated, but it was a brainstorm, and the slightly
> aggressive tone reflects our passion for these things. That said, I'm sorry
> that it sounds so negative and harsh, it's meant to be constructive.
>

Hey Jussi, no worries :) I didn't take your comments badly.  I know that
you're very passionate about audio just like I am, and so we should have
free discussions here.



>
> On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com> wrote:
>
>> Hi Jussi, thanks for your comments.
>>
>> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski <
>> jussi.kalliokoski@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> So we brainstormed a bit on this with my team yesterday about this, so
>>> I'm sending a summary of feedback as promised.
>>>
>>> A little foreword, however. As negative as some points may seem, this is
>>> exactly what I wanted to get, because we already have heard a lot of good
>>> things about the APIs, so this is purely getting it all out there. So don't
>>> be fooled by the tone, we're really excited about both of the current
>>> proposals.
>>>
>>>  * One point that stood up is that while graph-based APIs are easily
>>> approachable to people who are not familiar with DSP, if you're doing any
>>> complex DSP and/or need to control the flow of your program, you'll end up
>>> working around the limitations of the API and eventually implementing the
>>> effects yourself. A few cases to demonstrate this point:
>>>
>>
>> I think we're certainly in agreement that some people want to write
>> specialized DSP code which isn't available as built-in nodes.  The
>> JavaScriptAudioNode allows this type of custom code.  I disagree slightly
>> with the wording and strength of your statement:
>>
>>  "if you're doing any complex DSP and/or need to control the flow of your
>> program, you'll end up working around the limitations of the API and
>> eventually implementing the effects yourself"
>>
>> I believe that the Web Audio API offers good potential for implementing
>> complex DSP and that many applications will not need to (or want to) go
>> down to hand-coded JavaScript DSP.  So I think your assertion is a bit
>> overstated.  But, the most important point is the need for custom
>> JavaScript processing which we both agree is a good tool.
>>
>
> Agreed. Again, sorry for the wording. I think that if you'd have to find a
> constructive argument here, it would be the need for the audio processing
> toolkit, which was suggested later.
>
>
>>    - The one presented in the mail earlier today, where you have a game
>>> that has timed events scheduled, and then you go to the menu, and the menu
>>> has it's own sounds. This means you'll have to either create multiple
>>> graphs (which seems to be currently restricted in the Chrome implementation
>>> of Web Audio API to a limited number) or handle the flow yourself (in a
>>> buffer-based processing API you could control this kind of a use case quite
>>> simply).
>>>
>>
>> I'm not quite sure what limitation you're highlighting here.  It's quite
>> possible to have many sub-graphs doing completely independent processing
>> all within a single AudioContext.
>>
>
> Maybe this part could be interpreted as a need for a way to pause a
> (sub-)graph's processing and scheduling.
>

I think because the JS is completely in control of the scheduling (with
noteOn()/noteOff() and AudioParam scheduling) as well as direct tweaking of
parameter values controlling volume that it's pretty straight-forward to do
this already.


>
>
>>
>>
>>>    - Let's say we have a delay effect with a filter in its feedback loop:
>>>      Input -----> Delay -----> Output
>>>               ^ <-- Filter <--^
>>>      Again, simple to achieve in a buffer based API, but not in a
>>> graph-based one.
>>>
>>
>> Really?  There are examples of delays with effects in the feedback loop
>> like the "WaveTable synth demo"  It wasn't too hard to achieve.  I added
>> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay
>> time in a very straight-forward way.
>>
>
> If this is the case, I applaud you for job well done, and apologize for
> not doing enough inspection before putting this out there.
>
>
>>
>>
>>>    - You need to get the data through a filter, then get the FFT data
>>> for that. You'll have to go through a serious amount of boilerplate to get
>>> where you want, whereas in a buffer based API, it might have just looked
>>> like this: fft(filter(data, parameters)), and you would get it
>>> synchronously, where as with Web Audio API for example, you have to do it
>>> asynchronously.
>>>
>>
>> I'm not quite sure what you mean.  I'm sure that for some specific custom
>> effects it would be easier to do this directly in JavaScript.  But you can
>> certainly have delays with convolution reverb in the feedback (which
>> happens to use FFTs in the internal implementation).
>>
>
> The point here is that things like FFT and filters are highly useful
> outside Audio use cases as well, and in those use cases it's very
> unconventional having to have it run through a processing graph and wait
> for the asynchronous processing to finish. So, the problem is that we're
> implementing generally useful DSP functionality in the browser, but forcing
> a certain (graph-based) workflow to it, which is highly uncomfortable
> outside live audio.
>
>
>
>>    - Time stretching is completely impossible achieve in a graph-based
>>> API, without a memory overflow in a blink of the eye, because you're not in
>>> control of the flow.
>>>
>>
>> I'm pretty sure that it's not impossible.  After all there's a crude
>> time-stretching demo here:
>> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html
>>
>>
> The audio fidelity is not very high, but the example code can be tweaked
>> for improvements there which are suitable for voice time-stretching, which
>> is one of its biggest uses.
>>
>
> I'm thinking about more complex and better quality algorithms, such as the
> Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ ) which
> uses FFT and friends quite nicely to achieve the time stretching with
> really high quality and astronomical stretching factors. Impossible might
> not be the right word, but you'll end up a working around the graph,
> because you can't control the flow. For example, you can't make the time
> stretcher a single node in the graph, because you can't limit the incoming
> data, so you'd keep buffering more and more data end up with a memory
> overflow quite quickly even with quite mild factors.
>

Hyper-stretching like Paulstretch is really cool.  I've played a lot with
this kind of stuff with SVP at IRCAM and spent a lot of time at Apple
developing such algorithms.  If you have a Mac handy you can see it listed
with 'auval':

% auval -a
aufc tmpt appl  -  Apple: AUTimePitch

For these types of phase-vocoder algorithms, you're best off working
directly in JavaScript.



>
>
>>
>>
>>>    Anyway, the common opinion seemed to be that graph-based API should
>>> be a higher level abstraction, not the basis of all functionality.
>>>
>>
>> It *is* a higher-level abstraction in the Web Audio API with one of the
>> nodes being available for direct JavaScript processing.
>>
>
> No, that's really like saying you can have a sea in the fish. The lower
> level access is achieved through the higher level abstraction, when the
> point here is that the higher level abstraction should be on top of the
> lower level API.
>

You can have it both ways with the Web Audio API:

1. Implement JS processing as one node mixed with several others in a Web
Audio Graph: example --  by using a JavaScriptAudioNode as a single node
mixed with other native processing, such as custom synth code in JS mixed
with high-quality reverberation using a ConvolverNode, and delays with
DelayNode, etc.
2. Don't use any of the graph features or nodes of the Web Audio API and
just create a single JavaScriptAudioNode, where you implement your own
graph API directly in JS



>
>>
>>
>>>  * Another thing that sort of relates to the previous point is that it
>>> would be highly useful to have native functions for high volume funtions
>>> that are expensive in JS. One example would be common functionality in
>>> decoders, such as clz (count leading zeroes). Also exposing native native
>>> decoders would be useful, but this is already done in both APIs to some
>>> extent (reading data from <audio> and <video> is possible). Another
>>> relation to the previous point is that instead of graph-based effects, you
>>> could control the flow yourself if we'd offer a sort of a standards library
>>> for most common expensive DSP functionality. This library could also
>>> include native encoders.
>>>
>>>
>> I think this notion of a library of common functions is exactly what the
>> built-in nodes of the Web Audio API represent.
>>
>
> But this enforces the graph workflow, which has its limitations, like said.
>

I think the limitations are rather modest and that there's a huge set of
compelling applications that can be built this way.  For some very
specialized processing then direct processing in JS can be used.


>
>
>> Web Audio API specific
>>>  * The number of simultaneous AudioContexts seems to be limited.
>>>
>>
>> There will always be a hard limit, but it can be fairly high.  Nearly all
>> use cases will only require a single context, however.
>>
>
> All right, but the current limit seems to be soft, on some setups it's as
> low as 2 contexts and after that creating the context throws an exception.
> This also seems to be page-universal, e.g. in the situation of two contexts
> limit, you could only have an AudioContext on two pages and the third would
> say "Sorry, your browser doesn't support the Web Audio API".
>
>
>>  * It's odd that common processing paradigms are handled natively, yet
>>> Sample Rate Conversion, which is a relatively expensive operation, is not.
>>
>>
>> I might not disagree with that point.
>>
>>
>>
>>> The spec says that the third argument for the audio context is sample
>>> rate, but the current implementation doesn't obey the sample rate you
>>> choose there.
>>
>>
>> I'm not clear where you're seeing this in the specification document.
>>
>
> Oh, sorry, we must have picked it up from the source code. IIRC, I've also
> seen it on some tutorials, using a syntax like new
> webkitAudioContext(something, bufferSize, sampleRate).
>
>
>>
>>
>>> However, given the setup cost for an AudioContext and the limited number
>>> of them, it would be far more efficient if you could specify the sample
>>> rate for individual JavaScriptProcessingNodes, since in what we're often
>>> handling varying sample rate and channel count sources. It should also be
>>> possible to change the sample rate on the fly.
>>>
>>
>> It's complex both conceptually for the developer and for the
>> implementation to manage many nodes all which are running at different
>> sample-rates.
>>
>
> We know this. It would be easier if the API did it for us. This is a real
> world use case. And it doesn't have to be a required parameter, if the
> developer doesn't tamper with the value, the implementation can just go at
> the default sample rate of the context, hence it doesn't really bother
> anyone who doesn't need it.
>

One thing we could consider later on is basically a "vari-speed"
rate-changing node which could accomplish what you want.  On a Mac if you
type:
% auval -a
aufc vari appl  -  Apple: AUVarispeed

The AUVarispeed is the rate-changing AudioUnit.  So we could consider such
a thing.  But, if so, I would hope it to be a version 2 feature, since it
can be confusing for basic users, since connecting nodes from different
parts of the graph can be impossible (given the different data rates).  But
I guess there's the old expression about "giving someone enough rope to
hang themselves".




>
>
>>
>>
>>>  * In the current implementation, there's no way to kill an AudioContext.
>>>
>>
>> It should be simple enough to add a "stopAllSound()" or "teardownGraph()"
>> method if developers find it useful.  It hasn't been seen to be a
>> limitation by anybody so far.
>>
>
> Well it's a limitation given the limit on the number of AudioContexts, so
> I think a method like this would be highly useful. The AudioContext should
> probably adhere to other specs about DOM garbage collection as well, so if
> there are no references to the AudioContext, it should be collected and
> destroyed, so I'm not sure a method is actually needed. As far as specs are
> concerned, this is probably an implementation bug, violating the garbage
> collection rules, as IIRC the Web Audio API doesn't explicitly specify any
> garbage collection rules for the AudioContexts.
>
>
>>
>>
>>>
>>> MediaStreams Processing Specific
>>>  * No main thread processing. May be a good thing however, because it's
>>> a good practice, but forcing good practices are usually a bad idea.
>>>
>>> Not necessarily in the scope of the Audio WG, but I'll still list them
>>> here:
>>>  * The ability to probe what sort of an audio device we're outputting
>>> to, and changes therefore (for example, are these internal speakers,
>>> earbuds, stage monitors or a basic 5.1 home theatre setup, and when you
>>> actually plug in the earbuds).
>>>  * The same for input devices. These would allow you to automatically do
>>> mixing, equalization and compression options for different setups.
>>>
>>
>> Yes, I agree we'll need some way to query the capabilities and choose the
>> available devices.  This will be exciting!
>>
>
> Indeed!
>
>
>>
>>
>>>
>>> There might have been some other points as well, but I can't remember
>>> right now. Hope this was helpful!
>>>
>>> Cheers,
>>> Jussi Kalliokoski
>>
>>
>> Jussi, thanks for your time in writing this up...
>>
>
> No problem, I'm glad to! I hope you aren't taking the points I'm making
> here personally, we really appreciate all the hard work you've done. :)
>
> Jussi
>

Received on Wednesday, 29 February 2012 18:56:02 UTC