Re: Feedback from Official.fm Labs from Mark Boas on 2012-02-29 (public-audio@w3.org from January to March 2012)

From: Mark Boas <markb@happyworm.com>
Date: Wed, 29 Feb 2012 22:30:28 +0300
To: Chris Rogers <crogers@google.com>
Cc: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>, public-audio@w3.org
Message-ID: <CAMnc=uAzRfOHkxvF9_=D9+Hcd0VrxNDYCfjhMU8wyaenjFQ0=w@mail.gmail.com>
Chris,

I appreciate all the work you are putting in and I also very much
appreciate Jussi's input. In fact I am really enjoying the discussion. It
seems very civilised and well intentioned.

All I really wanted to say is that I think it's very important to give
developers 'enough rope to hang themselves' because that is exactly the
amount of rope you need to make wonderful things. I hear what you're saying
about confusing basic users but I feel there must be a way to cater to both
basic and advanced and this is one of our missions here.

Regards

Mark

PS I think most of us on this group are passionate about audio otherwise we
wouldn't be here and it's great to see things moving. :)

On Wed, Feb 29, 2012 at 9:55 PM, Chris Rogers <crogers@google.com> wrote:

>
>
> On Tue, Feb 28, 2012 at 11:56 AM, Jussi Kalliokoski <
> jussi.kalliokoski@gmail.com> wrote:
>
>> Hey Chris,
>>
>> As I said in the beginning, don't be fooled by the tone, I just tried to
>> capture the gist of the conversation we had last night with the team. Some
>> of the points are overstated, but it was a brainstorm, and the slightly
>> aggressive tone reflects our passion for these things. That said, I'm sorry
>> that it sounds so negative and harsh, it's meant to be constructive.
>>
>
> Hey Jussi, no worries :) I didn't take your comments badly.  I know that
> you're very passionate about audio just like I am, and so we should have
> free discussions here.
>
>
>
>>
>> On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com> wrote:
>>
>>> Hi Jussi, thanks for your comments.
>>>
>>> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski <
>>> jussi.kalliokoski@gmail.com> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> So we brainstormed a bit on this with my team yesterday about this, so
>>>> I'm sending a summary of feedback as promised.
>>>>
>>>> A little foreword, however. As negative as some points may seem, this
>>>> is exactly what I wanted to get, because we already have heard a lot of
>>>> good things about the APIs, so this is purely getting it all out there. So
>>>> don't be fooled by the tone, we're really excited about both of the current
>>>> proposals.
>>>>
>>>>  * One point that stood up is that while graph-based APIs are easily
>>>> approachable to people who are not familiar with DSP, if you're doing any
>>>> complex DSP and/or need to control the flow of your program, you'll end up
>>>> working around the limitations of the API and eventually implementing the
>>>> effects yourself. A few cases to demonstrate this point:
>>>>
>>>
>>> I think we're certainly in agreement that some people want to write
>>> specialized DSP code which isn't available as built-in nodes.  The
>>> JavaScriptAudioNode allows this type of custom code.  I disagree slightly
>>> with the wording and strength of your statement:
>>>
>>>  "if you're doing any complex DSP and/or need to control the flow of
>>> your program, you'll end up working around the limitations of the API and
>>> eventually implementing the effects yourself"
>>>
>>> I believe that the Web Audio API offers good potential for implementing
>>> complex DSP and that many applications will not need to (or want to) go
>>> down to hand-coded JavaScript DSP.  So I think your assertion is a bit
>>> overstated.  But, the most important point is the need for custom
>>> JavaScript processing which we both agree is a good tool.
>>>
>>
>> Agreed. Again, sorry for the wording. I think that if you'd have to find
>> a constructive argument here, it would be the need for the audio processing
>> toolkit, which was suggested later.
>>
>>
>>>    - The one presented in the mail earlier today, where you have a game
>>>> that has timed events scheduled, and then you go to the menu, and the menu
>>>> has it's own sounds. This means you'll have to either create multiple
>>>> graphs (which seems to be currently restricted in the Chrome implementation
>>>> of Web Audio API to a limited number) or handle the flow yourself (in a
>>>> buffer-based processing API you could control this kind of a use case quite
>>>> simply).
>>>>
>>>
>>> I'm not quite sure what limitation you're highlighting here.  It's quite
>>> possible to have many sub-graphs doing completely independent processing
>>> all within a single AudioContext.
>>>
>>
>> Maybe this part could be interpreted as a need for a way to pause a
>> (sub-)graph's processing and scheduling.
>>
>
> I think because the JS is completely in control of the scheduling (with
> noteOn()/noteOff() and AudioParam scheduling) as well as direct tweaking of
> parameter values controlling volume that it's pretty straight-forward to do
> this already.
>
>
>>
>>
>>>
>>>
>>>>    - Let's say we have a delay effect with a filter in its feedback
>>>> loop:
>>>>      Input -----> Delay -----> Output
>>>>               ^ <-- Filter <--^
>>>>      Again, simple to achieve in a buffer based API, but not in a
>>>> graph-based one.
>>>>
>>>
>>> Really?  There are examples of delays with effects in the feedback loop
>>> like the "WaveTable synth demo"  It wasn't too hard to achieve.  I added
>>> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay
>>> time in a very straight-forward way.
>>>
>>
>> If this is the case, I applaud you for job well done, and apologize for
>> not doing enough inspection before putting this out there.
>>
>>
>>>
>>>
>>>>    - You need to get the data through a filter, then get the FFT data
>>>> for that. You'll have to go through a serious amount of boilerplate to get
>>>> where you want, whereas in a buffer based API, it might have just looked
>>>> like this: fft(filter(data, parameters)), and you would get it
>>>> synchronously, where as with Web Audio API for example, you have to do it
>>>> asynchronously.
>>>>
>>>
>>> I'm not quite sure what you mean.  I'm sure that for some specific
>>> custom effects it would be easier to do this directly in JavaScript.  But
>>> you can certainly have delays with convolution reverb in the feedback
>>> (which happens to use FFTs in the internal implementation).
>>>
>>
>> The point here is that things like FFT and filters are highly useful
>> outside Audio use cases as well, and in those use cases it's very
>> unconventional having to have it run through a processing graph and wait
>> for the asynchronous processing to finish. So, the problem is that we're
>> implementing generally useful DSP functionality in the browser, but forcing
>> a certain (graph-based) workflow to it, which is highly uncomfortable
>> outside live audio.
>>
>>
>>
>>>    - Time stretching is completely impossible achieve in a graph-based
>>>> API, without a memory overflow in a blink of the eye, because you're not in
>>>> control of the flow.
>>>>
>>>
>>> I'm pretty sure that it's not impossible.  After all there's a crude
>>> time-stretching demo here:
>>> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html
>>>
>>>
>> The audio fidelity is not very high, but the example code can be tweaked
>>> for improvements there which are suitable for voice time-stretching, which
>>> is one of its biggest uses.
>>>
>>
>> I'm thinking about more complex and better quality algorithms, such as
>> the Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ )
>> which uses FFT and friends quite nicely to achieve the time stretching with
>> really high quality and astronomical stretching factors. Impossible might
>> not be the right word, but you'll end up a working around the graph,
>> because you can't control the flow. For example, you can't make the time
>> stretcher a single node in the graph, because you can't limit the incoming
>> data, so you'd keep buffering more and more data end up with a memory
>> overflow quite quickly even with quite mild factors.
>>
>
> Hyper-stretching like Paulstretch is really cool.  I've played a lot with
> this kind of stuff with SVP at IRCAM and spent a lot of time at Apple
> developing such algorithms.  If you have a Mac handy you can see it listed
> with 'auval':
>
> % auval -a
> aufc tmpt appl  -  Apple: AUTimePitch
>
> For these types of phase-vocoder algorithms, you're best off working
> directly in JavaScript.
>
>
>
>>
>>
>>>
>>>
>>>>    Anyway, the common opinion seemed to be that graph-based API should
>>>> be a higher level abstraction, not the basis of all functionality.
>>>>
>>>
>>> It *is* a higher-level abstraction in the Web Audio API with one of the
>>> nodes being available for direct JavaScript processing.
>>>
>>
>> No, that's really like saying you can have a sea in the fish. The lower
>> level access is achieved through the higher level abstraction, when the
>> point here is that the higher level abstraction should be on top of the
>> lower level API.
>>
>
> You can have it both ways with the Web Audio API:
>
> 1. Implement JS processing as one node mixed with several others in a Web
> Audio Graph: example --  by using a JavaScriptAudioNode as a single node
> mixed with other native processing, such as custom synth code in JS mixed
> with high-quality reverberation using a ConvolverNode, and delays with
> DelayNode, etc.
> 2. Don't use any of the graph features or nodes of the Web Audio API and
> just create a single JavaScriptAudioNode, where you implement your own
> graph API directly in JS
>
>
>
>>
>>>
>>>
>>>>  * Another thing that sort of relates to the previous point is that it
>>>> would be highly useful to have native functions for high volume funtions
>>>> that are expensive in JS. One example would be common functionality in
>>>> decoders, such as clz (count leading zeroes). Also exposing native native
>>>> decoders would be useful, but this is already done in both APIs to some
>>>> extent (reading data from <audio> and <video> is possible). Another
>>>> relation to the previous point is that instead of graph-based effects, you
>>>> could control the flow yourself if we'd offer a sort of a standards library
>>>> for most common expensive DSP functionality. This library could also
>>>> include native encoders.
>>>>
>>>>
>>> I think this notion of a library of common functions is exactly what the
>>> built-in nodes of the Web Audio API represent.
>>>
>>
>> But this enforces the graph workflow, which has its limitations, like
>> said.
>>
>
> I think the limitations are rather modest and that there's a huge set of
> compelling applications that can be built this way.  For some very
> specialized processing then direct processing in JS can be used.
>
>
>>
>>
>>> Web Audio API specific
>>>>  * The number of simultaneous AudioContexts seems to be limited.
>>>>
>>>
>>> There will always be a hard limit, but it can be fairly high.  Nearly
>>> all use cases will only require a single context, however.
>>>
>>
>> All right, but the current limit seems to be soft, on some setups it's as
>> low as 2 contexts and after that creating the context throws an exception.
>> This also seems to be page-universal, e.g. in the situation of two contexts
>> limit, you could only have an AudioContext on two pages and the third would
>> say "Sorry, your browser doesn't support the Web Audio API".
>>
>>
>>>  * It's odd that common processing paradigms are handled natively, yet
>>>> Sample Rate Conversion, which is a relatively expensive operation, is not.
>>>
>>>
>>> I might not disagree with that point.
>>>
>>>
>>>
>>>> The spec says that the third argument for the audio context is sample
>>>> rate, but the current implementation doesn't obey the sample rate you
>>>> choose there.
>>>
>>>
>>> I'm not clear where you're seeing this in the specification document.
>>>
>>
>> Oh, sorry, we must have picked it up from the source code. IIRC, I've
>> also seen it on some tutorials, using a syntax like new
>> webkitAudioContext(something, bufferSize, sampleRate).
>>
>>
>>>
>>>
>>>> However, given the setup cost for an AudioContext and the limited
>>>> number of them, it would be far more efficient if you could specify the
>>>> sample rate for individual JavaScriptProcessingNodes, since in what we're
>>>> often handling varying sample rate and channel count sources. It should
>>>> also be possible to change the sample rate on the fly.
>>>>
>>>
>>> It's complex both conceptually for the developer and for the
>>> implementation to manage many nodes all which are running at different
>>> sample-rates.
>>>
>>
>> We know this. It would be easier if the API did it for us. This is a real
>> world use case. And it doesn't have to be a required parameter, if the
>> developer doesn't tamper with the value, the implementation can just go at
>> the default sample rate of the context, hence it doesn't really bother
>> anyone who doesn't need it.
>>
>
> One thing we could consider later on is basically a "vari-speed"
> rate-changing node which could accomplish what you want.  On a Mac if you
> type:
> % auval -a
> aufc vari appl  -  Apple: AUVarispeed
>
> The AUVarispeed is the rate-changing AudioUnit.  So we could consider such
> a thing.  But, if so, I would hope it to be a version 2 feature, since it
> can be confusing for basic users, since connecting nodes from different
> parts of the graph can be impossible (given the different data rates).  But
> I guess there's the old expression about "giving someone enough rope to
> hang themselves".
>
>
>
>
>>
>>
>>>
>>>
>>>>  * In the current implementation, there's no way to kill an
>>>> AudioContext.
>>>>
>>>
>>> It should be simple enough to add a "stopAllSound()" or
>>> "teardownGraph()" method if developers find it useful.  It hasn't been seen
>>> to be a limitation by anybody so far.
>>>
>>
>> Well it's a limitation given the limit on the number of AudioContexts, so
>> I think a method like this would be highly useful. The AudioContext should
>> probably adhere to other specs about DOM garbage collection as well, so if
>> there are no references to the AudioContext, it should be collected and
>> destroyed, so I'm not sure a method is actually needed. As far as specs are
>> concerned, this is probably an implementation bug, violating the garbage
>> collection rules, as IIRC the Web Audio API doesn't explicitly specify any
>> garbage collection rules for the AudioContexts.
>>
>>
>>>
>>>
>>>>
>>>> MediaStreams Processing Specific
>>>>  * No main thread processing. May be a good thing however, because it's
>>>> a good practice, but forcing good practices are usually a bad idea.
>>>>
>>>> Not necessarily in the scope of the Audio WG, but I'll still list them
>>>> here:
>>>>  * The ability to probe what sort of an audio device we're outputting
>>>> to, and changes therefore (for example, are these internal speakers,
>>>> earbuds, stage monitors or a basic 5.1 home theatre setup, and when you
>>>> actually plug in the earbuds).
>>>>  * The same for input devices. These would allow you to automatically
>>>> do mixing, equalization and compression options for different setups.
>>>>
>>>
>>> Yes, I agree we'll need some way to query the capabilities and choose
>>> the available devices.  This will be exciting!
>>>
>>
>> Indeed!
>>
>>
>>>
>>>
>>>>
>>>> There might have been some other points as well, but I can't remember
>>>> right now. Hope this was helpful!
>>>>
>>>> Cheers,
>>>> Jussi Kalliokoski
>>>
>>>
>>> Jussi, thanks for your time in writing this up...
>>>
>>
>> No problem, I'm glad to! I hope you aren't taking the points I'm making
>> here personally, we really appreciate all the hard work you've done. :)
>>
>> Jussi
>>
>
>
Received on Wednesday, 29 February 2012 19:30:57 UTC