Re: Web Audio API Proposal from Chris Marrin on 2010-06-15 (public-xg-audio@w3.org from June 2010)

From: Chris Marrin <cmarrin@apple.com>
Date: Tue, 15 Jun 2010 16:01:40 -0700
To: robert@ocallahan.org
Cc: public-xg-audio@w3.org
Message-Id: <B9AE6A34-2D4C-435B-9D66-CF3DC500DEAA@apple.com>
On Jun 15, 2010, at 3:29 PM, Robert O'Callahan wrote:

> First, I apologize for the unnecessarily strident tone of my first email.
> 
> On Wed, Jun 16, 2010 at 1:36 AM, Chris Marrin <cmarrin@apple.com> wrote:
> I believe the two proposals mark the limits (simple to complex) of what is needed for an audio processing API in a web browser. And I think that is an excellent starting point. The Mozilla proposal would severely limit the types of audio processing possible on many devices, especially mobile devices. Providing native APIs for the most common processing models makes as much sense as providing filters in SVG or spline curves in Canvas.
> 
> Filters in SVG are an interesting analogy. The fixed set of primitives is very limiting, even though you can combine them in ways similar to the proposed "audio node graph" ... you often end up doing horrible hacks to get the effects that you want. Those hacks often require many primitives so performance ends up being poor. Similarly, GL moved away from the fixed-function pipeline to a much more flexible and programmable model.

It's an interesting idea to imagine an "audio processing language" for use here. But I don't think JavaScript is the appropriate candidate for such a language. OpenCL would do a good job of audio processing. Someday perhaps WebGL shaders will take the place of SVG filters. And perhaps in the future we will have WebCL, which can bring OpenCL capabilities to the browser. At that point, "programmable audio processing" might be possible. Until then, I think we need a set of fixed function audio processing capabilities.

I disagree with the notion that SVG filters inevitably lead to horrible hacks. You can do very interesting things with chains of SVG filters. You can abuse them, too, and pay a hefty penalty for your sins. But you can seriously abuse GLSL as well with similar disastrous results. Self restraint is always the prudent course.

> 
> Splines in 2D canvas are not as good an analogy because they are a simple abstraction that meets practically all path construction needs. I haven't seen people trying to hack around the limitations of splines.

Many of the audio nodes described in Chris' proposal are similarly simple abstractions of existing audio functionality.

> 
> Importantly, these nodes DO NOT give access to the samples themselves. I believe this is important because it allows very efficient audio processing chains to be created and optimized without the need to expose the underlying details of how buffering occurs.
> 
> This is a key point. Can you describe how an implementation would exploit that restriction?

If access to the bits is not needed, underlying audio libraries can collapse multiple filters in a chain down to one. One could even imagine audio hardware with built-in echo processing. A simple echo filter could collapse down to simply setting a flag in the audio hardware. An implementation could place a filter chain entirely in the GPU (using OpenCL). The need to access samples in the middle of  that chain would significantly degrade performance.

> 
> I believe section (3) is a very important one to get right. I see 2 problems with leaving FFT processing to JavaScript. First of all, an FFT is such a standard algorithm it seems like a very reasonable and obvious thing to include in an API.
> 
> Maybe so, but we could make it part of a built-in math library that works on typed arrays. Then it could be used for all kinds of applications.

That might be useful, although I think making an FFT that is useful for audio processing might be simpler than a full-featured FFT. I'm not familiar enough with the algorithms to know for sure.

> 
> Second, there are some implementations of JavaScript that will not be able to keep up with the processing of 48Khz stereo audio. These implementations will have to reduce the sample rate,which will make the FFT calculations less accurate. And even if an implementation is able to keep up with the data rate, it will leave very little for any other JavaScript to run at any sort of reasonable frame rate.
> 
> So these must be platforms where "native" FFTs are much faster than any possible JS implementation. What accounts for that performance difference?

Audio hardware of varying complexity and GPU audio processing. There's also the issue of syncing audio with other media, which is a standard part of audio processing API's. I'm not sure you can ensure synchronization with a JS API.

-----
~Chris
cmarrin@apple.com
Received on Tuesday, 15 June 2010 23:02:41 UTC