- From: Mark Boas <markb@happyworm.com>
- Date: Wed, 29 Feb 2012 22:30:28 +0300
- To: Chris Rogers <crogers@google.com>
- Cc: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>, public-audio@w3.org
- Message-ID: <CAMnc=uAzRfOHkxvF9_=D9+Hcd0VrxNDYCfjhMU8wyaenjFQ0=w@mail.gmail.com>
Chris, I appreciate all the work you are putting in and I also very much appreciate Jussi's input. In fact I am really enjoying the discussion. It seems very civilised and well intentioned. All I really wanted to say is that I think it's very important to give developers 'enough rope to hang themselves' because that is exactly the amount of rope you need to make wonderful things. I hear what you're saying about confusing basic users but I feel there must be a way to cater to both basic and advanced and this is one of our missions here. Regards Mark PS I think most of us on this group are passionate about audio otherwise we wouldn't be here and it's great to see things moving. :) On Wed, Feb 29, 2012 at 9:55 PM, Chris Rogers <crogers@google.com> wrote: > > > On Tue, Feb 28, 2012 at 11:56 AM, Jussi Kalliokoski < > jussi.kalliokoski@gmail.com> wrote: > >> Hey Chris, >> >> As I said in the beginning, don't be fooled by the tone, I just tried to >> capture the gist of the conversation we had last night with the team. Some >> of the points are overstated, but it was a brainstorm, and the slightly >> aggressive tone reflects our passion for these things. That said, I'm sorry >> that it sounds so negative and harsh, it's meant to be constructive. >> > > Hey Jussi, no worries :) I didn't take your comments badly. I know that > you're very passionate about audio just like I am, and so we should have > free discussions here. > > > >> >> On Tue, Feb 28, 2012 at 8:59 PM, Chris Rogers <crogers@google.com> wrote: >> >>> Hi Jussi, thanks for your comments. >>> >>> On Tue, Feb 28, 2012 at 7:18 AM, Jussi Kalliokoski < >>> jussi.kalliokoski@gmail.com> wrote: >>> >>>> Hey guys, >>>> >>>> So we brainstormed a bit on this with my team yesterday about this, so >>>> I'm sending a summary of feedback as promised. >>>> >>>> A little foreword, however. As negative as some points may seem, this >>>> is exactly what I wanted to get, because we already have heard a lot of >>>> good things about the APIs, so this is purely getting it all out there. So >>>> don't be fooled by the tone, we're really excited about both of the current >>>> proposals. >>>> >>>> * One point that stood up is that while graph-based APIs are easily >>>> approachable to people who are not familiar with DSP, if you're doing any >>>> complex DSP and/or need to control the flow of your program, you'll end up >>>> working around the limitations of the API and eventually implementing the >>>> effects yourself. A few cases to demonstrate this point: >>>> >>> >>> I think we're certainly in agreement that some people want to write >>> specialized DSP code which isn't available as built-in nodes. The >>> JavaScriptAudioNode allows this type of custom code. I disagree slightly >>> with the wording and strength of your statement: >>> >>> "if you're doing any complex DSP and/or need to control the flow of >>> your program, you'll end up working around the limitations of the API and >>> eventually implementing the effects yourself" >>> >>> I believe that the Web Audio API offers good potential for implementing >>> complex DSP and that many applications will not need to (or want to) go >>> down to hand-coded JavaScript DSP. So I think your assertion is a bit >>> overstated. But, the most important point is the need for custom >>> JavaScript processing which we both agree is a good tool. >>> >> >> Agreed. Again, sorry for the wording. I think that if you'd have to find >> a constructive argument here, it would be the need for the audio processing >> toolkit, which was suggested later. >> >> >>> - The one presented in the mail earlier today, where you have a game >>>> that has timed events scheduled, and then you go to the menu, and the menu >>>> has it's own sounds. This means you'll have to either create multiple >>>> graphs (which seems to be currently restricted in the Chrome implementation >>>> of Web Audio API to a limited number) or handle the flow yourself (in a >>>> buffer-based processing API you could control this kind of a use case quite >>>> simply). >>>> >>> >>> I'm not quite sure what limitation you're highlighting here. It's quite >>> possible to have many sub-graphs doing completely independent processing >>> all within a single AudioContext. >>> >> >> Maybe this part could be interpreted as a need for a way to pause a >> (sub-)graph's processing and scheduling. >> > > I think because the JS is completely in control of the scheduling (with > noteOn()/noteOff() and AudioParam scheduling) as well as direct tweaking of > parameter values controlling volume that it's pretty straight-forward to do > this already. > > >> >> >>> >>> >>>> - Let's say we have a delay effect with a filter in its feedback >>>> loop: >>>> Input -----> Delay -----> Output >>>> ^ <-- Filter <--^ >>>> Again, simple to achieve in a buffer based API, but not in a >>>> graph-based one. >>>> >>> >>> Really? There are examples of delays with effects in the feedback loop >>> like the "WaveTable synth demo" It wasn't too hard to achieve. I added >>> controls for the dry/wet mix, the feedback, and the BPM-synchronized delay >>> time in a very straight-forward way. >>> >> >> If this is the case, I applaud you for job well done, and apologize for >> not doing enough inspection before putting this out there. >> >> >>> >>> >>>> - You need to get the data through a filter, then get the FFT data >>>> for that. You'll have to go through a serious amount of boilerplate to get >>>> where you want, whereas in a buffer based API, it might have just looked >>>> like this: fft(filter(data, parameters)), and you would get it >>>> synchronously, where as with Web Audio API for example, you have to do it >>>> asynchronously. >>>> >>> >>> I'm not quite sure what you mean. I'm sure that for some specific >>> custom effects it would be easier to do this directly in JavaScript. But >>> you can certainly have delays with convolution reverb in the feedback >>> (which happens to use FFTs in the internal implementation). >>> >> >> The point here is that things like FFT and filters are highly useful >> outside Audio use cases as well, and in those use cases it's very >> unconventional having to have it run through a processing graph and wait >> for the asynchronous processing to finish. So, the problem is that we're >> implementing generally useful DSP functionality in the browser, but forcing >> a certain (graph-based) workflow to it, which is highly uncomfortable >> outside live audio. >> >> >> >>> - Time stretching is completely impossible achieve in a graph-based >>>> API, without a memory overflow in a blink of the eye, because you're not in >>>> control of the flow. >>>> >>> >>> I'm pretty sure that it's not impossible. After all there's a crude >>> time-stretching demo here: >>> http://chromium.googlecode.com/svn/trunk/samples/audio/granular.html >>> >>> >> The audio fidelity is not very high, but the example code can be tweaked >>> for improvements there which are suitable for voice time-stretching, which >>> is one of its biggest uses. >>> >> >> I'm thinking about more complex and better quality algorithms, such as >> the Paulstretch ( http://hypermammut.sourceforge.net/paulstretch/ ) >> which uses FFT and friends quite nicely to achieve the time stretching with >> really high quality and astronomical stretching factors. Impossible might >> not be the right word, but you'll end up a working around the graph, >> because you can't control the flow. For example, you can't make the time >> stretcher a single node in the graph, because you can't limit the incoming >> data, so you'd keep buffering more and more data end up with a memory >> overflow quite quickly even with quite mild factors. >> > > Hyper-stretching like Paulstretch is really cool. I've played a lot with > this kind of stuff with SVP at IRCAM and spent a lot of time at Apple > developing such algorithms. If you have a Mac handy you can see it listed > with 'auval': > > % auval -a > aufc tmpt appl - Apple: AUTimePitch > > For these types of phase-vocoder algorithms, you're best off working > directly in JavaScript. > > > >> >> >>> >>> >>>> Anyway, the common opinion seemed to be that graph-based API should >>>> be a higher level abstraction, not the basis of all functionality. >>>> >>> >>> It *is* a higher-level abstraction in the Web Audio API with one of the >>> nodes being available for direct JavaScript processing. >>> >> >> No, that's really like saying you can have a sea in the fish. The lower >> level access is achieved through the higher level abstraction, when the >> point here is that the higher level abstraction should be on top of the >> lower level API. >> > > You can have it both ways with the Web Audio API: > > 1. Implement JS processing as one node mixed with several others in a Web > Audio Graph: example -- by using a JavaScriptAudioNode as a single node > mixed with other native processing, such as custom synth code in JS mixed > with high-quality reverberation using a ConvolverNode, and delays with > DelayNode, etc. > 2. Don't use any of the graph features or nodes of the Web Audio API and > just create a single JavaScriptAudioNode, where you implement your own > graph API directly in JS > > > >> >>> >>> >>>> * Another thing that sort of relates to the previous point is that it >>>> would be highly useful to have native functions for high volume funtions >>>> that are expensive in JS. One example would be common functionality in >>>> decoders, such as clz (count leading zeroes). Also exposing native native >>>> decoders would be useful, but this is already done in both APIs to some >>>> extent (reading data from <audio> and <video> is possible). Another >>>> relation to the previous point is that instead of graph-based effects, you >>>> could control the flow yourself if we'd offer a sort of a standards library >>>> for most common expensive DSP functionality. This library could also >>>> include native encoders. >>>> >>>> >>> I think this notion of a library of common functions is exactly what the >>> built-in nodes of the Web Audio API represent. >>> >> >> But this enforces the graph workflow, which has its limitations, like >> said. >> > > I think the limitations are rather modest and that there's a huge set of > compelling applications that can be built this way. For some very > specialized processing then direct processing in JS can be used. > > >> >> >>> Web Audio API specific >>>> * The number of simultaneous AudioContexts seems to be limited. >>>> >>> >>> There will always be a hard limit, but it can be fairly high. Nearly >>> all use cases will only require a single context, however. >>> >> >> All right, but the current limit seems to be soft, on some setups it's as >> low as 2 contexts and after that creating the context throws an exception. >> This also seems to be page-universal, e.g. in the situation of two contexts >> limit, you could only have an AudioContext on two pages and the third would >> say "Sorry, your browser doesn't support the Web Audio API". >> >> >>> * It's odd that common processing paradigms are handled natively, yet >>>> Sample Rate Conversion, which is a relatively expensive operation, is not. >>> >>> >>> I might not disagree with that point. >>> >>> >>> >>>> The spec says that the third argument for the audio context is sample >>>> rate, but the current implementation doesn't obey the sample rate you >>>> choose there. >>> >>> >>> I'm not clear where you're seeing this in the specification document. >>> >> >> Oh, sorry, we must have picked it up from the source code. IIRC, I've >> also seen it on some tutorials, using a syntax like new >> webkitAudioContext(something, bufferSize, sampleRate). >> >> >>> >>> >>>> However, given the setup cost for an AudioContext and the limited >>>> number of them, it would be far more efficient if you could specify the >>>> sample rate for individual JavaScriptProcessingNodes, since in what we're >>>> often handling varying sample rate and channel count sources. It should >>>> also be possible to change the sample rate on the fly. >>>> >>> >>> It's complex both conceptually for the developer and for the >>> implementation to manage many nodes all which are running at different >>> sample-rates. >>> >> >> We know this. It would be easier if the API did it for us. This is a real >> world use case. And it doesn't have to be a required parameter, if the >> developer doesn't tamper with the value, the implementation can just go at >> the default sample rate of the context, hence it doesn't really bother >> anyone who doesn't need it. >> > > One thing we could consider later on is basically a "vari-speed" > rate-changing node which could accomplish what you want. On a Mac if you > type: > % auval -a > aufc vari appl - Apple: AUVarispeed > > The AUVarispeed is the rate-changing AudioUnit. So we could consider such > a thing. But, if so, I would hope it to be a version 2 feature, since it > can be confusing for basic users, since connecting nodes from different > parts of the graph can be impossible (given the different data rates). But > I guess there's the old expression about "giving someone enough rope to > hang themselves". > > > > >> >> >>> >>> >>>> * In the current implementation, there's no way to kill an >>>> AudioContext. >>>> >>> >>> It should be simple enough to add a "stopAllSound()" or >>> "teardownGraph()" method if developers find it useful. It hasn't been seen >>> to be a limitation by anybody so far. >>> >> >> Well it's a limitation given the limit on the number of AudioContexts, so >> I think a method like this would be highly useful. The AudioContext should >> probably adhere to other specs about DOM garbage collection as well, so if >> there are no references to the AudioContext, it should be collected and >> destroyed, so I'm not sure a method is actually needed. As far as specs are >> concerned, this is probably an implementation bug, violating the garbage >> collection rules, as IIRC the Web Audio API doesn't explicitly specify any >> garbage collection rules for the AudioContexts. >> >> >>> >>> >>>> >>>> MediaStreams Processing Specific >>>> * No main thread processing. May be a good thing however, because it's >>>> a good practice, but forcing good practices are usually a bad idea. >>>> >>>> Not necessarily in the scope of the Audio WG, but I'll still list them >>>> here: >>>> * The ability to probe what sort of an audio device we're outputting >>>> to, and changes therefore (for example, are these internal speakers, >>>> earbuds, stage monitors or a basic 5.1 home theatre setup, and when you >>>> actually plug in the earbuds). >>>> * The same for input devices. These would allow you to automatically >>>> do mixing, equalization and compression options for different setups. >>>> >>> >>> Yes, I agree we'll need some way to query the capabilities and choose >>> the available devices. This will be exciting! >>> >> >> Indeed! >> >> >>> >>> >>>> >>>> There might have been some other points as well, but I can't remember >>>> right now. Hope this was helpful! >>>> >>>> Cheers, >>>> Jussi Kalliokoski >>> >>> >>> Jussi, thanks for your time in writing this up... >>> >> >> No problem, I'm glad to! I hope you aren't taking the points I'm making >> here personally, we really appreciate all the hard work you've done. :) >> >> Jussi >> > >
Received on Wednesday, 29 February 2012 19:30:57 UTC