Re: Resolution to republish MSP as a note from Chris Rogers on 2012-08-13 (public-audio@w3.org from July to September 2012)

From: Chris Rogers <crogers@google.com>
Date: Mon, 13 Aug 2012 14:37:02 -0700
To: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Cc: olivier Thereaux <olivier.thereaux@bbc.co.uk>, Mark Boas <markb@happyworm.com>, Jussi Kalliokoski <jussi.kalliokoski@gmail.com>, James Wei <james.wei@intel.com>, Stéphane Letz <letz@grame.fr>, Audio Working Group <public-audio@w3.org>, Matthew Paradis <matthew.paradis@bbc.co.uk>, Christopher Lowis <Chris.Lowis@bbc.co.uk>
Message-ID: <CA+EzO0kCEKGsh+Lc4A1PhB8Cwc1k3Rq+D9M3t0zXROk0=3W3WA@mail.gmail.com>

Hi Kumar, I think you summarize the state of affairs very eloquently.

On Fri, Aug 10, 2012 at 5:29 PM, Srikumar Karaikudi Subramanian <
srikumarks@gmail.com> wrote:

> > * The high-level access provided by the web audio API is great and makes
> it easy to audio processing an analysis code easily, today, with very
> little concern for optimization.
> >
> > * The moment you want to build anything custom, the API in its current
> state is not great. I recall my team complaining that the moment you want
> to do custom processing, you have to basically wrap everything in your own
> class, and write a lot of boilerplate. [Ping ChrisL/Matt for details.]
>
>
> These two points are an excellent summary of the feedback indeed and we do
> want both.
>
> The criticism of the custom processing part has two aspects to it -
>
> 1. We cannot at the moment make a JS audio node that can look and quack
> like any other native node, disregarding efficiency. So we have to
> discriminate between JS nodes and native nodes. To solve this API problem,
> we need AudioParams, multiple inputs/outputs and dynamic lifetime support
> in JS nodes. This also helps with future proofing.
>

I think that many of these problems are solvable.

>
> 2. The timing characteristics (latency/delay) of JS audio nodes are
> inconsistent with the native nodes, which makes mixing JS audio nodes with
> native nodes problematic, even given the steady efficiency improvements
> we're seeing in JS runtimes.
>
> We can do 200 calculations per output sample consuming < 2% of a 1GFlop
> cpu (with a 2x margin). This is adequate for mixing triggered sounds.
> Glitch-free audio therefore is not a matter of efficiency, but is about
> stealing that 2% (~ 0.2ms for every 512 samples) at the right time, every
> time. This is the core technical problem that is solved by the current
> native node design. I believe JS efficiency will improve quickly enough to
> render the DSP API redundant, but whether we're going to get JS code to run
> in a timely fashion is unclear [1].
>
> Workers have been proposed as a possible answer to that, but there are
> several unknowns. Can we get RT workers? Is there enough incentive for
> browser vendors to improve communication latency between the main thread
> and workers? Will workers be ubiquitously available and performant? - i.e.
> will some constrained mobile devices want to adopt web audio but opt out of
> worker support? How do we introduce new APIs to workers that will be needed
> for the JS code? [2]
>
> We think we need JS audio nodes for custom processing. But we don't really
> need the ability to call 100% arbitrary JS code in the node's
> onaudioprocess. Can we maybe achieve enough flexibility through some
> special support that *can* be run in an RT thread or critical callback?
> Perhaps a JS subset or even WebCL? If ubiquity is a problem for WebCL,
> perhaps a limited non-blocking version of a language like Chuck? [3] Then
> we'll be able to compose programmable nodes just like native nodes with
> comparable efficiency and latency and get both high level ease of use and
> custom processing.

The idea of using a specialized audio "shader" language is an attractive
one.  I'm not sure that WebCL is the right language because there are some
security issues surrounding it, and I think its future is unclear.  If we
look at some other choices, there are other audio languages that have been
developed over the decades.  You mention Chuck, but there's also CSound,
SuperCollider, Faust,  MPEG-4's SAOL, and others.

In some ways SAOL (or a stripped down version of it) seems like the type of
thing we would want.  If such a language were chosen, then it would have to
be shown to be totally secure to run in a browser environment.  Going
through the process of specifying the language and writing the run-time
implementation would involve an enormous effort.

That's why the idea of processing in JavaScript is so attractive, because
it's a language that web developers know, has solved the security issues,
and has implementations on all the browsers already.  Unfortunately, it's
not as well suited to more real-time applications, but it's still very
interesting and useful I think.

Chris

Received on Monday, 13 August 2012 21:37:29 UTC