- From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
- Date: Tue, 28 Feb 2012 17:18:00 +0200
- To: public-audio@w3.org
- Message-ID: <CAJhzemUQC-OCyVw4QHmN-z3=EmQWrMo7NL9DtMepsyn0Fc7R7Q@mail.gmail.com>
Hey guys, So we brainstormed a bit on this with my team yesterday about this, so I'm sending a summary of feedback as promised. A little foreword, however. As negative as some points may seem, this is exactly what I wanted to get, because we already have heard a lot of good things about the APIs, so this is purely getting it all out there. So don't be fooled by the tone, we're really excited about both of the current proposals. * One point that stood up is that while graph-based APIs are easily approachable to people who are not familiar with DSP, if you're doing any complex DSP and/or need to control the flow of your program, you'll end up working around the limitations of the API and eventually implementing the effects yourself. A few cases to demonstrate this point: - The one presented in the mail earlier today, where you have a game that has timed events scheduled, and then you go to the menu, and the menu has it's own sounds. This means you'll have to either create multiple graphs (which seems to be currently restricted in the Chrome implementation of Web Audio API to a limited number) or handle the flow yourself (in a buffer-based processing API you could control this kind of a use case quite simply). - Let's say we have a delay effect with a filter in its feedback loop: Input -----> Delay -----> Output ^ <-- Filter <--^ Again, simple to achieve in a buffer based API, but not in a graph-based one. - You need to get the data through a filter, then get the FFT data for that. You'll have to go through a serious amount of boilerplate to get where you want, whereas in a buffer based API, it might have just looked like this: fft(filter(data, parameters)), and you would get it synchronously, where as with Web Audio API for example, you have to do it asynchronously. - Time stretching is completely impossible achieve in a graph-based API, without a memory overflow in a blink of the eye, because you're not in control of the flow. Anyway, the common opinion seemed to be that graph-based API should be a higher level abstraction, not the basis of all functionality. * Another thing that sort of relates to the previous point is that it would be highly useful to have native functions for high volume funtions that are expensive in JS. One example would be common functionality in decoders, such as clz (count leading zeroes). Also exposing native native decoders would be useful, but this is already done in both APIs to some extent (reading data from <audio> and <video> is possible). Another relation to the previous point is that instead of graph-based effects, you could control the flow yourself if we'd offer a sort of a standards library for most common expensive DSP functionality. This library could also include native encoders. Web Audio API specific * The number of simultaneous AudioContexts seems to be limited. * It's odd that common processing paradigms are handled natively, yet Sample Rate Conversion, which is a relatively expensive operation, is not. The spec says that the third argument for the audio context is sample rate, but the current implementation doesn't obey the sample rate you choose there. However, given the setup cost for an AudioContext and the limited number of them, it would be far more efficient if you could specify the sample rate for individual JavaScriptProcessingNodes, since in what we're often handling varying sample rate and channel count sources. It should also be possible to change the sample rate on the fly. * In the current implementation, there's no way to kill an AudioContext. MediaStreams Processing Specific * No main thread processing. May be a good thing however, because it's a good practice, but forcing good practices are usually a bad idea. Not necessarily in the scope of the Audio WG, but I'll still list them here: * The ability to probe what sort of an audio device we're outputting to, and changes therefore (for example, are these internal speakers, earbuds, stage monitors or a basic 5.1 home theatre setup, and when you actually plug in the earbuds). * The same for input devices. These would allow you to automatically do mixing, equalization and compression options for different setups. There might have been some other points as well, but I can't remember right now. Hope this was helpful! Cheers, Jussi Kalliokoski
Received on Tuesday, 28 February 2012 15:18:31 UTC