Re: Resolution to republish MSP as a note from Srikumar Karaikudi Subramanian on 2012-08-11 (public-audio@w3.org from July to September 2012)

From: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Date: Sat, 11 Aug 2012 08:29:47 +0800
To: olivier Thereaux <olivier.thereaux@bbc.co.uk>
Cc: Mark Boas <markb@happyworm.com>, Jussi Kalliokoski <jussi.kalliokoski@gmail.com>, James Wei <james.wei@intel.com>, Chris Rogers <crogers@google.com>, Stéphane Letz <letz@grame.fr>, Audio Working Group <public-audio@w3.org>, Matthew Paradis <matthew.paradis@bbc.co.uk>, Christopher Lowis <Chris.Lowis@bbc.co.uk>
Message-Id: <06776961-F57F-4281-BC57-93B567D6C070@gmail.com>

> * The high-level access provided by the web audio API is great and makes it easy to audio processing an analysis code easily, today, with very little concern for optimization.
> 
> * The moment you want to build anything custom, the API in its current state is not great. I recall my team complaining that the moment you want to do custom processing, you have to basically wrap everything in your own class, and write a lot of boilerplate. [Ping ChrisL/Matt for details.]

These two points are an excellent summary of the feedback indeed and we do want both. 

The criticism of the custom processing part has two aspects to it -

1. We cannot at the moment make a JS audio node that can look and quack like any other native node, disregarding efficiency. So we have to discriminate between JS nodes and native nodes. To solve this API problem, we need AudioParams, multiple inputs/outputs and dynamic lifetime support in JS nodes. This also helps with future proofing.

2. The timing characteristics (latency/delay) of JS audio nodes are inconsistent with the native nodes, which makes mixing JS audio nodes with native nodes problematic, even given the steady efficiency improvements we're seeing in JS runtimes.

We can do 200 calculations per output sample consuming < 2% of a 1GFlop cpu (with a 2x margin). This is adequate for mixing triggered sounds. Glitch-free audio therefore is not a matter of efficiency, but is about stealing that 2% (~ 0.2ms for every 512 samples) at the right time, every time. This is the core technical problem that is solved by the current native node design. I believe JS efficiency will improve quickly enough to render the DSP API redundant, but whether we're going to get JS code to run in a timely fashion is unclear [1].

Workers have been proposed as a possible answer to that, but there are several unknowns. Can we get RT workers? Is there enough incentive for browser vendors to improve communication latency between the main thread and workers? Will workers be ubiquitously available and performant? - i.e. will some constrained mobile devices want to adopt web audio but opt out of worker support? How do we introduce new APIs to workers that will be needed for the JS code? [2]

We think we need JS audio nodes for custom processing. But we don't really need the ability to call 100% arbitrary JS code in the node's onaudioprocess. Can we maybe achieve enough flexibility through some special support that *can* be run in an RT thread or critical callback? Perhaps a JS subset or even WebCL? If ubiquity is a problem for WebCL, perhaps a limited non-blocking version of a language like Chuck? [3] Then we'll be able to compose programmable nodes just like native nodes with comparable efficiency and latency and get both high level ease of use and custom processing.

In terms of "doesn't play well with the rest of the eco system", I'm unsure of what "playing well" entails beyond adding node types that can fetch and send streams. Isn't such loose coupling a good thing? For example, sending a video stream to a WebGL texture needn't influence WebGL's design much beyond supporting that specific transfer using an API.

I realize I may have raised more questions, but it seems to me that the above technical factors are the crucial ones affecting a design that can be extended to adapt to the future well.

This discussion is worth having. Looking forward to comments and apologies for the long post.
-Kumar

[1] To probe this, I wrote a simple graph compiler that takes a set of nodes and compiles a graph down to a single onaudioprocess function on a JS audio node (teaser: supports oversampling and arbitrary feedback loops with single sample delay). I was pleasantly surprised that I could run some simple graphs at > 50x of 44.1KHz without glitching, but was disappointed that at even 1x the sample rate, the UI activity of dragging around a node would cause audio to break up. Bummer! Bottom-line learning was that JS is fast and has been getting faster, but not more timely. If anyone is interested, I can point to this code ... with the disclaimer that it was written in don't-worry-be-crappy-and-brainstorm mode :P

[2] Rather than have JS audio nodes do their job in workers, it might be better to have a whole context run all of its JS nodes in a single worker. This will cut down on system resource usage and improve inter-node communication. (I haven't thought through this very much though.)

[3] This is comparable to WebGL. Mark described WebGL as a "low level api". You specify the geometry and, using *two* special purpose languages, specify how you want that geometry to show up. That's pretty high level to me! With a low level api, I would be able to write my own lighting shader and fog shader systems. "Level" is a thus matter of perspective and a high level system like Chuck (for ex.) can be flexible enough to feel low level enough to many users.

On 9 Aug, 2012, at 11:56 PM, olivier Thereaux <olivier.thereaux@bbc.co.uk> wrote:

> 
> On 9 Aug 2012, at 09:40, Mark Boas wrote:
> 
>> I perhaps naively assumed a blend of low and high level could work and this is why I was very happy to see the MSP included as a note and hopefully provide inspiration for the low level features.
> 
> I still believe this is the case. 
> 
>> Standards are not about vendors, they are about developers.
> 
> Sure. Notwithstanding the requests from developers who want to build libraries (useful, but a very specific view of things) the feedback we have received from developers seems to be:
> 
> * The high-level access provided by the web audio API is great and makes it easy to audio processing an analysis code easily, today, with very little concern for optimization.
> 
> * The moment you want to build anything custom, the API in its current state is not great. I recall my team complaining that the moment you want to do custom processing, you have to basically wrap everything in your own class, and write a lot of boilerplate. [Ping ChrisL/Matt for details.]
> 
> The demand is for both, and I suspect this group would benefit from having fewer suggestions that we should go for one XOR the other.
> 
> Olivier

Received on Saturday, 11 August 2012 00:30:26 UTC