- From: Chris Rogers <crogers@google.com>
- Date: Wed, 29 Feb 2012 10:55:32 -0800
- To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
- Cc: public-audio@w3.org
- Message-ID: <CA+EzO0=a6BRgZMM704yjA1mT1x_fGZxm+CnHk8wqPRE-TP9TaQ@mail.gmail.com>
On Tue, Feb 28, 2012 at 11:56 AM, Jussi Kalliokoski < jussi.kalliokoski@gmail.com> wrote: > Hey Chris, > > As I said in the beginning, don't be fooled by the tone, I just tried to > capture the gist of the conversation we had last night with the team. Some > of the points are overstated, but it was a brainstorm, and the slightly > aggressive tone reflects our passion for these things. That said, I'm sorry > that it sounds so negative and harsh, it's meant to be constructive. > Hey Jussi, no worries :) I didn't take your comments badly. I know that you're very passionate about audio just like I am, and so we should have free discussions here. > > On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com> wrote: > >> Hi Jussi, thanks for your comments. >> >> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski < >> jussi.kalliokoski@gmail.com> wrote: >> >>> Hey guys, >>> >>> So we brainstormed a bit on this with my team yesterday about this, so >>> I'm sending a summary of feedback as promised. >>> >>> A little foreword, however. As negative as some points may seem, this is >>> exactly what I wanted to get, because we already have heard a lot of good >>> things about the APIs, so this is purely getting it all out there. So don't >>> be fooled by the tone, we're really excited about both of the current >>> proposals. >>> >>> * One point that stood up is that while graph-based APIs are easily >>> approachable to people who are not familiar with DSP, if you're doing any >>> complex DSP and/or need to control the flow of your program, you'll end up >>> working around the limitations of the API and eventually implementing the >>> effects yourself. A few cases to demonstrate this point: >>> >> >> I think we're certainly in agreement that some people want to write >> specialized DSP code which isn't available as built-in nodes. The >> JavaScriptAudioNode allows this type of custom code. I disagree slightly >> with the wording and strength of your statement: >> >> "if you're doing any complex DSP and/or need to control the flow of your >> program, you'll end up working around the limitations of the API and >> eventually implementing the effects yourself" >> >> I believe that the Web Audio API offers good potential for implementing >> complex DSP and that many applications will not need to (or want to) go >> down to hand-coded JavaScript DSP. So I think your assertion is a bit >> overstated. But, the most important point is the need for custom >> JavaScript processing which we both agree is a good tool. >> > > Agreed. Again, sorry for the wording. I think that if you'd have to find a > constructive argument here, it would be the need for the audio processing > toolkit, which was suggested later. > > >> - The one presented in the mail earlier today, where you have a game >>> that has timed events scheduled, and then you go to the menu, and the menu >>> has it's own sounds. This means you'll have to either create multiple >>> graphs (which seems to be currently restricted in the Chrome implementation >>> of Web Audio API to a limited number) or handle the flow yourself (in a >>> buffer-based processing API you could control this kind of a use case quite >>> simply). >>> >> >> I'm not quite sure what limitation you're highlighting here. It's quite >> possible to have many sub-graphs doing completely independent processing >> all within a single AudioContext. >> > > Maybe this part could be interpreted as a need for a way to pause a > (sub-)graph's processing and scheduling. > I think because the JS is completely in control of the scheduling (with noteOn()/noteOff() and AudioParam scheduling) as well as direct tweaking of parameter values controlling volume that it's pretty straight-forward to do this already. > > >> >> >>> - Let's say we have a delay effect with a filter in its feedback loop: >>> Input -----> Delay -----> Output >>> ^ <-- Filter <--^ >>> Again, simple to achieve in a buffer based API, but not in a >>> graph-based one. >>> >> >> Really? There are examples of delays with effects in the feedback loop >> like the "WaveTable synth demo" It wasn't too hard to achieve. I added >> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay >> time in a very straight-forward way. >> > > If this is the case, I applaud you for job well done, and apologize for > not doing enough inspection before putting this out there. > > >> >> >>> - You need to get the data through a filter, then get the FFT data >>> for that. You'll have to go through a serious amount of boilerplate to get >>> where you want, whereas in a buffer based API, it might have just looked >>> like this: fft(filter(data, parameters)), and you would get it >>> synchronously, where as with Web Audio API for example, you have to do it >>> asynchronously. >>> >> >> I'm not quite sure what you mean. I'm sure that for some specific custom >> effects it would be easier to do this directly in JavaScript. But you can >> certainly have delays with convolution reverb in the feedback (which >> happens to use FFTs in the internal implementation). >> > > The point here is that things like FFT and filters are highly useful > outside Audio use cases as well, and in those use cases it's very > unconventional having to have it run through a processing graph and wait > for the asynchronous processing to finish. So, the problem is that we're > implementing generally useful DSP functionality in the browser, but forcing > a certain (graph-based) workflow to it, which is highly uncomfortable > outside live audio. > > > >> - Time stretching is completely impossible achieve in a graph-based >>> API, without a memory overflow in a blink of the eye, because you're not in >>> control of the flow. >>> >> >> I'm pretty sure that it's not impossible. After all there's a crude >> time-stretching demo here: >> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html >> >> > The audio fidelity is not very high, but the example code can be tweaked >> for improvements there which are suitable for voice time-stretching, which >> is one of its biggest uses. >> > > I'm thinking about more complex and better quality algorithms, such as the > Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ ) which > uses FFT and friends quite nicely to achieve the time stretching with > really high quality and astronomical stretching factors. Impossible might > not be the right word, but you'll end up a working around the graph, > because you can't control the flow. For example, you can't make the time > stretcher a single node in the graph, because you can't limit the incoming > data, so you'd keep buffering more and more data end up with a memory > overflow quite quickly even with quite mild factors. > Hyper-stretching like Paulstretch is really cool. I've played a lot with this kind of stuff with SVP at IRCAM and spent a lot of time at Apple developing such algorithms. If you have a Mac handy you can see it listed with 'auval': % auval -a aufc tmpt appl - Apple: AUTimePitch For these types of phase-vocoder algorithms, you're best off working directly in JavaScript. > > >> >> >>> Anyway, the common opinion seemed to be that graph-based API should >>> be a higher level abstraction, not the basis of all functionality. >>> >> >> It *is* a higher-level abstraction in the Web Audio API with one of the >> nodes being available for direct JavaScript processing. >> > > No, that's really like saying you can have a sea in the fish. The lower > level access is achieved through the higher level abstraction, when the > point here is that the higher level abstraction should be on top of the > lower level API. > You can have it both ways with the Web Audio API: 1. Implement JS processing as one node mixed with several others in a Web Audio Graph: example -- by using a JavaScriptAudioNode as a single node mixed with other native processing, such as custom synth code in JS mixed with high-quality reverberation using a ConvolverNode, and delays with DelayNode, etc. 2. Don't use any of the graph features or nodes of the Web Audio API and just create a single JavaScriptAudioNode, where you implement your own graph API directly in JS > >> >> >>> * Another thing that sort of relates to the previous point is that it >>> would be highly useful to have native functions for high volume funtions >>> that are expensive in JS. One example would be common functionality in >>> decoders, such as clz (count leading zeroes). Also exposing native native >>> decoders would be useful, but this is already done in both APIs to some >>> extent (reading data from <audio> and <video> is possible). Another >>> relation to the previous point is that instead of graph-based effects, you >>> could control the flow yourself if we'd offer a sort of a standards library >>> for most common expensive DSP functionality. This library could also >>> include native encoders. >>> >>> >> I think this notion of a library of common functions is exactly what the >> built-in nodes of the Web Audio API represent. >> > > But this enforces the graph workflow, which has its limitations, like said. > I think the limitations are rather modest and that there's a huge set of compelling applications that can be built this way. For some very specialized processing then direct processing in JS can be used. > > >> Web Audio API specific >>> * The number of simultaneous AudioContexts seems to be limited. >>> >> >> There will always be a hard limit, but it can be fairly high. Nearly all >> use cases will only require a single context, however. >> > > All right, but the current limit seems to be soft, on some setups it's as > low as 2 contexts and after that creating the context throws an exception. > This also seems to be page-universal, e.g. in the situation of two contexts > limit, you could only have an AudioContext on two pages and the third would > say "Sorry, your browser doesn't support the Web Audio API". > > >> * It's odd that common processing paradigms are handled natively, yet >>> Sample Rate Conversion, which is a relatively expensive operation, is not. >> >> >> I might not disagree with that point. >> >> >> >>> The spec says that the third argument for the audio context is sample >>> rate, but the current implementation doesn't obey the sample rate you >>> choose there. >> >> >> I'm not clear where you're seeing this in the specification document. >> > > Oh, sorry, we must have picked it up from the source code. IIRC, I've also > seen it on some tutorials, using a syntax like new > webkitAudioContext(something, bufferSize, sampleRate). > > >> >> >>> However, given the setup cost for an AudioContext and the limited number >>> of them, it would be far more efficient if you could specify the sample >>> rate for individual JavaScriptProcessingNodes, since in what we're often >>> handling varying sample rate and channel count sources. It should also be >>> possible to change the sample rate on the fly. >>> >> >> It's complex both conceptually for the developer and for the >> implementation to manage many nodes all which are running at different >> sample-rates. >> > > We know this. It would be easier if the API did it for us. This is a real > world use case. And it doesn't have to be a required parameter, if the > developer doesn't tamper with the value, the implementation can just go at > the default sample rate of the context, hence it doesn't really bother > anyone who doesn't need it. > One thing we could consider later on is basically a "vari-speed" rate-changing node which could accomplish what you want. On a Mac if you type: % auval -a aufc vari appl - Apple: AUVarispeed The AUVarispeed is the rate-changing AudioUnit. So we could consider such a thing. But, if so, I would hope it to be a version 2 feature, since it can be confusing for basic users, since connecting nodes from different parts of the graph can be impossible (given the different data rates). But I guess there's the old expression about "giving someone enough rope to hang themselves". > > >> >> >>> * In the current implementation, there's no way to kill an AudioContext. >>> >> >> It should be simple enough to add a "stopAllSound()" or "teardownGraph()" >> method if developers find it useful. It hasn't been seen to be a >> limitation by anybody so far. >> > > Well it's a limitation given the limit on the number of AudioContexts, so > I think a method like this would be highly useful. The AudioContext should > probably adhere to other specs about DOM garbage collection as well, so if > there are no references to the AudioContext, it should be collected and > destroyed, so I'm not sure a method is actually needed. As far as specs are > concerned, this is probably an implementation bug, violating the garbage > collection rules, as IIRC the Web Audio API doesn't explicitly specify any > garbage collection rules for the AudioContexts. > > >> >> >>> >>> MediaStreams Processing Specific >>> * No main thread processing. May be a good thing however, because it's >>> a good practice, but forcing good practices are usually a bad idea. >>> >>> Not necessarily in the scope of the Audio WG, but I'll still list them >>> here: >>> * The ability to probe what sort of an audio device we're outputting >>> to, and changes therefore (for example, are these internal speakers, >>> earbuds, stage monitors or a basic 5.1 home theatre setup, and when you >>> actually plug in the earbuds). >>> * The same for input devices. These would allow you to automatically do >>> mixing, equalization and compression options for different setups. >>> >> >> Yes, I agree we'll need some way to query the capabilities and choose the >> available devices. This will be exciting! >> > > Indeed! > > >> >> >>> >>> There might have been some other points as well, but I can't remember >>> right now. Hope this was helpful! >>> >>> Cheers, >>> Jussi Kalliokoski >> >> >> Jussi, thanks for your time in writing this up... >> > > No problem, I'm glad to! I hope you aren't taking the points I'm making > here personally, we really appreciate all the hard work you've done. :) > > Jussi >
Received on Wednesday, 29 February 2012 18:56:02 UTC