Re: Feedback from Official.fm Labs from Jussi Kalliokoski on 2012-02-28 (public-audio@w3.org from January to March 2012)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Tue, 28 Feb 2012 21:56:14 +0200
To: Chris Rogers <crogers@google.com>
Cc: public-audio@w3.org
Message-ID: <CAJhzemVS=s=5qe14SCQs-J+jvZBMvRLoLz96cny-9_OtnpZPXw@mail.gmail.com>
Hey Chris,

As I said in the beginning, don't be fooled by the tone, I just tried to
capture the gist of the conversation we had last night with the team. Some
of the points are overstated, but it was a brainstorm, and the slightly
aggressive tone reflects our passion for these things. That said, I'm sorry
that it sounds so negative and harsh, it's meant to be constructive.

On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com> wrote:

> Hi Jussi, thanks for your comments.
>
> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski <
> jussi.kalliokoski@gmail.com> wrote:
>
>> Hey guys,
>>
>> So we brainstormed a bit on this with my team yesterday about this, so
>> I'm sending a summary of feedback as promised.
>>
>> A little foreword, however. As negative as some points may seem, this is
>> exactly what I wanted to get, because we already have heard a lot of good
>> things about the APIs, so this is purely getting it all out there. So don't
>> be fooled by the tone, we're really excited about both of the current
>> proposals.
>>
>>  * One point that stood up is that while graph-based APIs are easily
>> approachable to people who are not familiar with DSP, if you're doing any
>> complex DSP and/or need to control the flow of your program, you'll end up
>> working around the limitations of the API and eventually implementing the
>> effects yourself. A few cases to demonstrate this point:
>>
>
> I think we're certainly in agreement that some people want to write
> specialized DSP code which isn't available as built-in nodes.  The
> JavaScriptAudioNode allows this type of custom code.  I disagree slightly
> with the wording and strength of your statement:
>
>  "if you're doing any complex DSP and/or need to control the flow of your
> program, you'll end up working around the limitations of the API and
> eventually implementing the effects yourself"
>
> I believe that the Web Audio API offers good potential for implementing
> complex DSP and that many applications will not need to (or want to) go
> down to hand-coded JavaScript DSP.  So I think your assertion is a bit
> overstated.  But, the most important point is the need for custom
> JavaScript processing which we both agree is a good tool.
>

Agreed. Again, sorry for the wording. I think that if you'd have to find a
constructive argument here, it would be the need for the audio processing
toolkit, which was suggested later.


>    - The one presented in the mail earlier today, where you have a game
>> that has timed events scheduled, and then you go to the menu, and the menu
>> has it's own sounds. This means you'll have to either create multiple
>> graphs (which seems to be currently restricted in the Chrome implementation
>> of Web Audio API to a limited number) or handle the flow yourself (in a
>> buffer-based processing API you could control this kind of a use case quite
>> simply).
>>
>
> I'm not quite sure what limitation you're highlighting here.  It's quite
> possible to have many sub-graphs doing completely independent processing
> all within a single AudioContext.
>

Maybe this part could be interpreted as a need for a way to pause a
(sub-)graph's processing and scheduling.


>
>
>>    - Let's say we have a delay effect with a filter in its feedback loop:
>>      Input -----> Delay -----> Output
>>               ^ <-- Filter <--^
>>      Again, simple to achieve in a buffer based API, but not in a
>> graph-based one.
>>
>
> Really?  There are examples of delays with effects in the feedback loop
> like the "WaveTable synth demo"  It wasn't too hard to achieve.  I added
> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay
> time in a very straight-forward way.
>

If this is the case, I applaud you for job well done, and apologize for not
doing enough inspection before putting this out there.


>
>
>>    - You need to get the data through a filter, then get the FFT data for
>> that. You'll have to go through a serious amount of boilerplate to get
>> where you want, whereas in a buffer based API, it might have just looked
>> like this: fft(filter(data, parameters)), and you would get it
>> synchronously, where as with Web Audio API for example, you have to do it
>> asynchronously.
>>
>
> I'm not quite sure what you mean.  I'm sure that for some specific custom
> effects it would be easier to do this directly in JavaScript.  But you can
> certainly have delays with convolution reverb in the feedback (which
> happens to use FFTs in the internal implementation).
>

The point here is that things like FFT and filters are highly useful
outside Audio use cases as well, and in those use cases it's very
unconventional having to have it run through a processing graph and wait
for the asynchronous processing to finish. So, the problem is that we're
implementing generally useful DSP functionality in the browser, but forcing
a certain (graph-based) workflow to it, which is highly uncomfortable
outside live audio.



>    - Time stretching is completely impossible achieve in a graph-based
>> API, without a memory overflow in a blink of the eye, because you're not in
>> control of the flow.
>>
>
> I'm pretty sure that it's not impossible.  After all there's a crude
> time-stretching demo here:
> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html
>
>
The audio fidelity is not very high, but the example code can be tweaked
> for improvements there which are suitable for voice time-stretching, which
> is one of its biggest uses.
>

I'm thinking about more complex and better quality algorithms, such as the
Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ ) which uses
FFT and friends quite nicely to achieve the time stretching with really
high quality and astronomical stretching factors. Impossible might not be
the right word, but you'll end up a working around the graph, because you
can't control the flow. For example, you can't make the time stretcher a
single node in the graph, because you can't limit the incoming data, so
you'd keep buffering more and more data end up with a memory overflow quite
quickly even with quite mild factors.


>
>
>>    Anyway, the common opinion seemed to be that graph-based API should be
>> a higher level abstraction, not the basis of all functionality.
>>
>
> It *is* a higher-level abstraction in the Web Audio API with one of the
> nodes being available for direct JavaScript processing.
>

No, that's really like saying you can have a sea in the fish. The lower
level access is achieved through the higher level abstraction, when the
point here is that the higher level abstraction should be on top of the
lower level API.


>
>
>>  * Another thing that sort of relates to the previous point is that it
>> would be highly useful to have native functions for high volume funtions
>> that are expensive in JS. One example would be common functionality in
>> decoders, such as clz (count leading zeroes). Also exposing native native
>> decoders would be useful, but this is already done in both APIs to some
>> extent (reading data from <audio> and <video> is possible). Another
>> relation to the previous point is that instead of graph-based effects, you
>> could control the flow yourself if we'd offer a sort of a standards library
>> for most common expensive DSP functionality. This library could also
>> include native encoders.
>>
>>
> I think this notion of a library of common functions is exactly what the
> built-in nodes of the Web Audio API represent.
>

But this enforces the graph workflow, which has its limitations, like said.


> Web Audio API specific
>>  * The number of simultaneous AudioContexts seems to be limited.
>>
>
> There will always be a hard limit, but it can be fairly high.  Nearly all
> use cases will only require a single context, however.
>

All right, but the current limit seems to be soft, on some setups it's as
low as 2 contexts and after that creating the context throws an exception.
This also seems to be page-universal, e.g. in the situation of two contexts
limit, you could only have an AudioContext on two pages and the third would
say "Sorry, your browser doesn't support the Web Audio API".


>  * It's odd that common processing paradigms are handled natively, yet
>> Sample Rate Conversion, which is a relatively expensive operation, is not.
>
>
> I might not disagree with that point.
>
>
>
>> The spec says that the third argument for the audio context is sample
>> rate, but the current implementation doesn't obey the sample rate you
>> choose there.
>
>
> I'm not clear where you're seeing this in the specification document.
>

Oh, sorry, we must have picked it up from the source code. IIRC, I've also
seen it on some tutorials, using a syntax like new
webkitAudioContext(something, bufferSize, sampleRate).


>
>
>> However, given the setup cost for an AudioContext and the limited number
>> of them, it would be far more efficient if you could specify the sample
>> rate for individual JavaScriptProcessingNodes, since in what we're often
>> handling varying sample rate and channel count sources. It should also be
>> possible to change the sample rate on the fly.
>>
>
> It's complex both conceptually for the developer and for the
> implementation to manage many nodes all which are running at different
> sample-rates.
>

We know this. It would be easier if the API did it for us. This is a real
world use case. And it doesn't have to be a required parameter, if the
developer doesn't tamper with the value, the implementation can just go at
the default sample rate of the context, hence it doesn't really bother
anyone who doesn't need it.


>
>
>>  * In the current implementation, there's no way to kill an AudioContext.
>>
>
> It should be simple enough to add a "stopAllSound()" or "teardownGraph()"
> method if developers find it useful.  It hasn't been seen to be a
> limitation by anybody so far.
>

Well it's a limitation given the limit on the number of AudioContexts, so I
think a method like this would be highly useful. The AudioContext should
probably adhere to other specs about DOM garbage collection as well, so if
there are no references to the AudioContext, it should be collected and
destroyed, so I'm not sure a method is actually needed. As far as specs are
concerned, this is probably an implementation bug, violating the garbage
collection rules, as IIRC the Web Audio API doesn't explicitly specify any
garbage collection rules for the AudioContexts.


>
>
>>
>> MediaStreams Processing Specific
>>  * No main thread processing. May be a good thing however, because it's a
>> good practice, but forcing good practices are usually a bad idea.
>>
>> Not necessarily in the scope of the Audio WG, but I'll still list them
>> here:
>>  * The ability to probe what sort of an audio device we're outputting to,
>> and changes therefore (for example, are these internal speakers, earbuds,
>> stage monitors or a basic 5.1 home theatre setup, and when you actually
>> plug in the earbuds).
>>  * The same for input devices. These would allow you to automatically do
>> mixing, equalization and compression options for different setups.
>>
>
> Yes, I agree we'll need some way to query the capabilities and choose the
> available devices.  This will be exciting!
>

Indeed!


>
>
>>
>> There might have been some other points as well, but I can't remember
>> right now. Hope this was helpful!
>>
>> Cheers,
>> Jussi Kalliokoski
>
>
> Jussi, thanks for your time in writing this up...
>

No problem, I'm glad to! I hope you aren't taking the points I'm making
here personally, we really appreciate all the hard work you've done. :)

Jussi
Received on Tuesday, 28 February 2012 19:56:43 UTC