- From: Mark Boas <markb@happyworm.com>
- Date: Sat, 3 Mar 2012 14:18:56 +0300
- To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
- Cc: Chris Rogers <crogers@google.com>, public-audio@w3.org
- Message-ID: <CAMnc=uBpi+Zrf+RNPR3ht32YrgEY+jv0U2jSD7kqF3qVOAnnhA@mail.gmail.com>
I'm in total agreement with Jussi here. Surely the low level stuff should be in place if not before at least at the same time as the higher level functionality. Please give us all the rope we need! Another perspective albeit from Official.fm camp : http://blog.aventine.se/post/18627646284/simple-audio Best Mark On Sat, Mar 3, 2012 at 1:45 PM, Jussi Kalliokoski < jussi.kalliokoski@gmail.com> wrote: > On Wed, Feb 29, 2012 at 8:55 PM, Chris Rogers <crogers@google.com> wrote: > >> >> >> On Tue, Feb 28, 2012 at 11:56 AM, Jussi Kalliokoski < >> jussi.kalliokoski@gmail.com> wrote: >> >>> Hey Chris, >>> >>> As I said in the beginning, don't be fooled by the tone, I just tried to >>> capture the gist of the conversation we had last night with the team. Some >>> of the points are overstated, but it was a brainstorm, and the slightly >>> aggressive tone reflects our passion for these things. That said, I'm sorry >>> that it sounds so negative and harsh, it's meant to be constructive. >>> >> >> Hey Jussi, no worries :) I didn't take your comments badly. I know that >> you're very passionate about audio just like I am, and so we should have >> free discussions here. >> > > Great :) > > >> >> >> >>> >>> On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com>wrote: >>> >>>> Hi Jussi, thanks for your comments. >>>> >>>> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski < >>>> jussi.kalliokoski@gmail.com> wrote: >>>> >>>>> Hey guys, >>>>> >>>>> So we brainstormed a bit on this with my team yesterday about this, so >>>>> I'm sending a summary of feedback as promised. >>>>> >>>>> A little foreword, however. As negative as some points may seem, this >>>>> is exactly what I wanted to get, because we already have heard a lot of >>>>> good things about the APIs, so this is purely getting it all out there. So >>>>> don't be fooled by the tone, we're really excited about both of the current >>>>> proposals. >>>>> >>>>> * One point that stood up is that while graph-based APIs are easily >>>>> approachable to people who are not familiar with DSP, if you're doing any >>>>> complex DSP and/or need to control the flow of your program, you'll end up >>>>> working around the limitations of the API and eventually implementing the >>>>> effects yourself. A few cases to demonstrate this point: >>>>> >>>> >>>> I think we're certainly in agreement that some people want to write >>>> specialized DSP code which isn't available as built-in nodes. The >>>> JavaScriptAudioNode allows this type of custom code. I disagree slightly >>>> with the wording and strength of your statement: >>>> >>>> "if you're doing any complex DSP and/or need to control the flow of >>>> your program, you'll end up working around the limitations of the API and >>>> eventually implementing the effects yourself" >>>> >>>> I believe that the Web Audio API offers good potential for implementing >>>> complex DSP and that many applications will not need to (or want to) go >>>> down to hand-coded JavaScript DSP. So I think your assertion is a bit >>>> overstated. But, the most important point is the need for custom >>>> JavaScript processing which we both agree is a good tool. >>>> >>> >>> Agreed. Again, sorry for the wording. I think that if you'd have to find >>> a constructive argument here, it would be the need for the audio processing >>> toolkit, which was suggested later. >>> >>> >>>> - The one presented in the mail earlier today, where you have a game >>>>> that has timed events scheduled, and then you go to the menu, and the menu >>>>> has it's own sounds. This means you'll have to either create multiple >>>>> graphs (which seems to be currently restricted in the Chrome implementation >>>>> of Web Audio API to a limited number) or handle the flow yourself (in a >>>>> buffer-based processing API you could control this kind of a use case quite >>>>> simply). >>>>> >>>> >>>> I'm not quite sure what limitation you're highlighting here. It's >>>> quite possible to have many sub-graphs doing completely independent >>>> processing all within a single AudioContext. >>>> >>> >>> Maybe this part could be interpreted as a need for a way to pause a >>> (sub-)graph's processing and scheduling. >>> >> >> I think because the JS is completely in control of the scheduling (with >> noteOn()/noteOff() and AudioParam scheduling) as well as direct tweaking of >> parameter values controlling volume that it's pretty straight-forward to do >> this already. >> > > All right, so what happens if I use BufferSourceNode's noteOn() to play a > gunfire sound 10 seconds from now, then the user opens the menu, I don't > want the gunfire to fire while he's in the menu and the game is paused. I > also want it to play at the correct time when the user exits the menu. What > do I do? > > >> >> >>> >>> >>>> >>>> >>>>> - Let's say we have a delay effect with a filter in its feedback >>>>> loop: >>>>> Input -----> Delay -----> Output >>>>> ^ <-- Filter <--^ >>>>> Again, simple to achieve in a buffer based API, but not in a >>>>> graph-based one. >>>>> >>>> >>>> Really? There are examples of delays with effects in the feedback loop >>>> like the "WaveTable synth demo" It wasn't too hard to achieve. I added >>>> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay >>>> time in a very straight-forward way. >>>> >>> >>> If this is the case, I applaud you for job well done, and apologize for >>> not doing enough inspection before putting this out there. >>> >>> >>>> >>>> >>>>> - You need to get the data through a filter, then get the FFT data >>>>> for that. You'll have to go through a serious amount of boilerplate to get >>>>> where you want, whereas in a buffer based API, it might have just looked >>>>> like this: fft(filter(data, parameters)), and you would get it >>>>> synchronously, where as with Web Audio API for example, you have to do it >>>>> asynchronously. >>>>> >>>> >>>> I'm not quite sure what you mean. I'm sure that for some specific >>>> custom effects it would be easier to do this directly in JavaScript. But >>>> you can certainly have delays with convolution reverb in the feedback >>>> (which happens to use FFTs in the internal implementation). >>>> >>> >>> The point here is that things like FFT and filters are highly useful >>> outside Audio use cases as well, and in those use cases it's very >>> unconventional having to have it run through a processing graph and wait >>> for the asynchronous processing to finish. So, the problem is that we're >>> implementing generally useful DSP functionality in the browser, but forcing >>> a certain (graph-based) workflow to it, which is highly uncomfortable >>> outside live audio. >>> >>> >>> >>>> - Time stretching is completely impossible achieve in a graph-based >>>>> API, without a memory overflow in a blink of the eye, because you're not in >>>>> control of the flow. >>>>> >>>> >>>> I'm pretty sure that it's not impossible. After all there's a crude >>>> time-stretching demo here: >>>> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html >>>> >>>> >>> The audio fidelity is not very high, but the example code can be tweaked >>>> for improvements there which are suitable for voice time-stretching, which >>>> is one of its biggest uses. >>>> >>> >>> I'm thinking about more complex and better quality algorithms, such as >>> the Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ ) >>> which uses FFT and friends quite nicely to achieve the time stretching with >>> really high quality and astronomical stretching factors. Impossible might >>> not be the right word, but you'll end up a working around the graph, >>> because you can't control the flow. For example, you can't make the time >>> stretcher a single node in the graph, because you can't limit the incoming >>> data, so you'd keep buffering more and more data end up with a memory >>> overflow quite quickly even with quite mild factors. >>> >> >> Hyper-stretching like Paulstretch is really cool. I've played a lot with >> this kind of stuff with SVP at IRCAM and spent a lot of time at Apple >> developing such algorithms. If you have a Mac handy you can see it listed >> with 'auval': >> >> % auval -a >> aufc tmpt appl - Apple: AUTimePitch >> >> For these types of phase-vocoder algorithms, you're best off working >> directly in JavaScript. >> > > Yeah, exactly the problem I was highlighting that you can't for example > make a node that does this and interacts with the rest of the graph (has > inputs and outputs), so you'll have to work around the graph. > > I'll try that the next time I get my hands on a mac :) > > >> >> >> >>> >>> >>>> >>>> >>>>> Anyway, the common opinion seemed to be that graph-based API should >>>>> be a higher level abstraction, not the basis of all functionality. >>>>> >>>> >>>> It *is* a higher-level abstraction in the Web Audio API with one of the >>>> nodes being available for direct JavaScript processing. >>>> >>> >>> No, that's really like saying you can have a sea in the fish. The lower >>> level access is achieved through the higher level abstraction, when the >>> point here is that the higher level abstraction should be on top of the >>> lower level API. >>> >> >> You can have it both ways with the Web Audio API: >> >> 1. Implement JS processing as one node mixed with several others in a Web >> Audio Graph: example -- by using a JavaScriptAudioNode as a single node >> mixed with other native processing, such as custom synth code in JS mixed >> with high-quality reverberation using a ConvolverNode, and delays with >> DelayNode, etc. >> 2. Don't use any of the graph features or nodes of the Web Audio API and >> just create a single JavaScriptAudioNode, where you implement your own >> graph API directly in JS >> >> >> >>> >>>> >>>> >>>>> * Another thing that sort of relates to the previous point is that it >>>>> would be highly useful to have native functions for high volume funtions >>>>> that are expensive in JS. One example would be common functionality in >>>>> decoders, such as clz (count leading zeroes). Also exposing native native >>>>> decoders would be useful, but this is already done in both APIs to some >>>>> extent (reading data from <audio> and <video> is possible). Another >>>>> relation to the previous point is that instead of graph-based effects, you >>>>> could control the flow yourself if we'd offer a sort of a standards library >>>>> for most common expensive DSP functionality. This library could also >>>>> include native encoders. >>>>> >>>>> >>>> I think this notion of a library of common functions is exactly what >>>> the built-in nodes of the Web Audio API represent. >>>> >>> >>> But this enforces the graph workflow, which has its limitations, like >>> said. >>> >> >> I think the limitations are rather modest and that there's a huge set of >> compelling applications that can be built this way. For some very >> specialized processing then direct processing in JS can be used. >> > > The point stands. We're creating ugly redundancy if the graph has a native > FFT, but it can't be used for common mathematics purposes without changing > the workflow dramatically. > > >> >> >>> >>> >>>> Web Audio API specific >>>>> * The number of simultaneous AudioContexts seems to be limited. >>>>> >>>> >>>> There will always be a hard limit, but it can be fairly high. Nearly >>>> all use cases will only require a single context, however. >>>> >>> >>> All right, but the current limit seems to be soft, on some setups it's >>> as low as 2 contexts and after that creating the context throws an >>> exception. This also seems to be page-universal, e.g. in the situation of >>> two contexts limit, you could only have an AudioContext on two pages and >>> the third would say "Sorry, your browser doesn't support the Web Audio API". >>> >>> >>>> * It's odd that common processing paradigms are handled natively, yet >>>>> Sample Rate Conversion, which is a relatively expensive operation, is not. >>>> >>>> >>>> I might not disagree with that point. >>>> >>>> >>>> >>>>> The spec says that the third argument for the audio context is sample >>>>> rate, but the current implementation doesn't obey the sample rate you >>>>> choose there. >>>> >>>> >>>> I'm not clear where you're seeing this in the specification document. >>>> >>> >>> Oh, sorry, we must have picked it up from the source code. IIRC, I've >>> also seen it on some tutorials, using a syntax like new >>> webkitAudioContext(something, bufferSize, sampleRate). >>> >>> >>>> >>>> >>>>> However, given the setup cost for an AudioContext and the limited >>>>> number of them, it would be far more efficient if you could specify the >>>>> sample rate for individual JavaScriptProcessingNodes, since in what we're >>>>> often handling varying sample rate and channel count sources. It should >>>>> also be possible to change the sample rate on the fly. >>>>> >>>> >>>> It's complex both conceptually for the developer and for the >>>> implementation to manage many nodes all which are running at different >>>> sample-rates. >>>> >>> >>> We know this. It would be easier if the API did it for us. This is a >>> real world use case. And it doesn't have to be a required parameter, if the >>> developer doesn't tamper with the value, the implementation can just go at >>> the default sample rate of the context, hence it doesn't really bother >>> anyone who doesn't need it. >>> >> >> One thing we could consider later on is basically a "vari-speed" >> rate-changing node which could accomplish what you want. On a Mac if you >> type: >> % auval -a >> aufc vari appl - Apple: AUVarispeed >> >> The AUVarispeed is the rate-changing AudioUnit. So we could consider >> such a thing. But, if so, I would hope it to be a version 2 feature, since >> it can be confusing for basic users, since connecting nodes from different >> parts of the graph can be impossible (given the different data rates). But >> I guess there's the old expression about "giving someone enough rope to >> hang themselves". >> > > vari-speed algorithms aren't usually even much more expensive than > constant speed, so why not :) libraries like SRC (Secret Rabbit Code) check > for each buffer whether the ratio has changed enough to use vari-speed or > constant, so even for Sinc filtered resamplers it shouldn't be a problem. > As for the hanging man, I hope you aren't planning on making > BufferSourceNodes and MediaElementSourceNodes a version 2 feature as well, > they have differing sample rates from that of the context as well, correct? > And in the history of programming, being a nanny for end developers has > never been a good idea. If you give a sailor a rope, he might hang himself > in it, or take his ship to your port to trade with you and make everyone > prosper. But if you don't give him the rope, he can do neither. I > whole-heartedly disagree with moving this to a v2 spec. :) > > >> >> >> >> >>> >>> >>>> >>>> >>>>> * In the current implementation, there's no way to kill an >>>>> AudioContext. >>>>> >>>> >>>> It should be simple enough to add a "stopAllSound()" or >>>> "teardownGraph()" method if developers find it useful. It hasn't been seen >>>> to be a limitation by anybody so far. >>>> >>> >>> Well it's a limitation given the limit on the number of AudioContexts, >>> so I think a method like this would be highly useful. The AudioContext >>> should probably adhere to other specs about DOM garbage collection as well, >>> so if there are no references to the AudioContext, it should be collected >>> and destroyed, so I'm not sure a method is actually needed. As far as specs >>> are concerned, this is probably an implementation bug, violating the >>> garbage collection rules, as IIRC the Web Audio API doesn't explicitly >>> specify any garbage collection rules for the AudioContexts. >>> >>> >>>> >>>> >>>>> >>>>> MediaStreams Processing Specific >>>>> * No main thread processing. May be a good thing however, because >>>>> it's a good practice, but forcing good practices are usually a bad idea. >>>>> >>>>> Not necessarily in the scope of the Audio WG, but I'll still list them >>>>> here: >>>>> * The ability to probe what sort of an audio device we're outputting >>>>> to, and changes therefore (for example, are these internal speakers, >>>>> earbuds, stage monitors or a basic 5.1 home theatre setup, and when you >>>>> actually plug in the earbuds). >>>>> * The same for input devices. These would allow you to automatically >>>>> do mixing, equalization and compression options for different setups. >>>>> >>>> >>>> Yes, I agree we'll need some way to query the capabilities and choose >>>> the available devices. This will be exciting! >>>> >>> >>> Indeed! >>> >>> >>>> >>>> >>>>> >>>>> There might have been some other points as well, but I can't remember >>>>> right now. Hope this was helpful! >>>>> >>>>> Cheers, >>>>> Jussi Kalliokoski >>>> >>>> >>>> Jussi, thanks for your time in writing this up... >>>> >>> >>> No problem, I'm glad to! I hope you aren't taking the points I'm making >>> here personally, we really appreciate all the hard work you've done. :) >>> >>> Jussi >>> >> >> >
Received on Saturday, 3 March 2012 11:19:26 UTC