Common goals and framework

I agree with Chris Grigg that we need to reach consensus on a good starting point for an audio API. I also agree that we need to express the goals of this API clearly. But I don't think expressing those goals is going to solve the basic argument about whether audio processing should be done purely in JavaScript or whether we need to accommodate native processing. So I think our very first agreement needs to be the following:

1) Those with the preconceived notion that JavaScript is "too slow" for audio processing need to be disabused of that opinion.

2) Those who believe JavaScript will solve all our problems with "just a little optimization" need to accept that we are talking about a very wide range of hardware (which will be articulated below) and some of that hardware will have severe limitations with a JavaScript-only solution.

And most importantly:

3) We need to stop asking each other for examples that prove our points. That kind of argument in an early effort like this is unfair and not helpful. None of us have the killer example that will prove our point and lead us to the way forward, it's too early. There are interesting examples on both sides, but nothing that's going to convince anyone of anything. We can use those to inform us, but we're going to have to find some common ground without definitive proof. That would make things too easy :-)

I don't mean to be harsh. I am guilty as anyone about these preconceived notions. I hope we can all try to get past them.

Now, on to a few thoughts about scope and targets:

Scope

In our discussions at Apple and with Chris Rogers, we came up with 3 motivating examples:

1) Adding filter effects. This could be something simple like volume changes, mixing or echo, or something complex like pitch changes, doppler or chorus.

2) Audio visualizers. This is taking audio samples, processing them with an FFT or by averaging, and displaying the result graphically.

3) Spatialized audio. This is simulating an audio source positioned in 3D space with various walls and obstructions to modify the sound quality and placement


Additionally, we described 3 sources of audio:

1) Streaming audio (from <audio> and <video> nodes)

2) Buffered audio. This is file-based audio, but which is buffered in memory so it can be efficiently played and reused.

3) Generative audio. This could come from nodes with built-in oscillators and synthesizers, or generated from JavaScript


Finally, here's what I believe is the range of hardware we should be looking at (and this is my opinion and slanted by Apple's product line, of course):

1) "2nd generation" smart phones. These are more recent smart phones (in the last year or so) which have a single processor with very constrained memory and disk caching mechanisms, but which have some sort of GPU and other hardware which can be useful here. Importantly, these devices would not benefit much from multi-threaded CPU execution.

2) Tablets. Similar in constraints to smart phones, perhaps with a somewhat more powerful CPU. Expectations for audio may be a bit higher here because of the larger screen.

3) Midrange computers. Here is where we get into faster, multiple CPU's and probably GPU's with GPGPU capabilities which would be very beneficial for audio processing. Many fewer constraints than the mobile devices.

4) "Big Iron". These are big computers with > 10 CPU's, tons of memory and maybe multiple GPGPU's. 

I think the most important devices here are the first two, not because my company makes them, but because they will be the most constrained environments in which we must operate. If we come up with an API that works well for (3), but very poorly on (1) and (2), I think we have failed.

This is what I think we're up against. Comments?

-----
~Chris
cmarrin@apple.com

Received on Thursday, 17 June 2010 13:47:59 UTC