Re: Questioning the current direction of the Web Audio API from Srikumar Karaikudi Subramanian on 2013-10-21 (public-audio@w3.org from October to December 2013)

From: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Date: Mon, 21 Oct 2013 11:30:57 +0530
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: s p <sebpiq@gmail.com>, Joseph Berkovitz <joe@noteflight.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-Id: <946C9597-00FA-4570-9579-69D7E298B823@gmail.com>
> Then, given the current limitations of the ScriptProcessorNode, if I implemented just the missing parts with ScriptProcessorNodes, I would end up being in a worse place in terms of performance than if I had gone with just doing everything in a single ScriptProcessorNode, due to reasons already mentioned by Sebastien.

This is pretty much what Lonce and I concluded - if you want to use the ScriptProcessorNode, do everything in it, or use native nodes exclusively. While the latter approach is quite viable for applications like games (though perhaps not for truly experimental game audio), the former is only viable in situations where audio takes centre stage ... but at least both are *viable* to a good extent today. The ScriptProcessorNode is simply *not* something that can be used to emulate the other nodes.[1]

I would like to step back a bit and look at the "hindsight" aspect of this discussion.

Way back, we *did* have a "minimum viable product" specd more or less as you ask - "easy way to write sound, any sound, to the users' speakers". Before the Web Audio API's early incarnations even existed (iirc), Firefox already had such an experimental API available which was already getting some visibility. The problem was that it was "minimal", but not really "viable" for many applications. In fact, what we have today in the ScriptProcessorNode is perfectly equivalent functionally to what Firefox provided back then .. and we're still complaining about it. 

The current broken-ness of the ScriptProcessorNode is not entirely a failure of the WG. It stems from the fact that running JS code in any of the sandboxed environments available in the browser - the main thread or workers - introduces intolerable communication latencies and potential for glitches for engaging sound. In short, these sandboxed environment suck for complete programming flexibility. Will communication with workers improve? Perhaps. Should the WG push that? Maybe. Will it happen across the board on all devices? I Don't know. Would devices be able to run some efficient native audio components? Hell yes, they already do. The design flaw, from this perspective, looks like not enough native components.

The limitations of the pure JS approach reared its ugly head very early on and continue to persist till date. If we coded a game with that API, detected a game event and triggered a sound now, we hear it tomorrow, unless you're ok with audio breaking up. If we looked around at other flexible systems such as SuperCollider / MaxMSP / PD / CSound, we'd notice that all of them had gone through building components in native code for performance and plugging them into a glue language - with some more expressive and flexible than others. If we consider the emerging devices, JS performance (compute per watt) was expected to remain low on them for a while ... and is still relatively low on ARM devices *today* with no clear future other than accelerating various aspects one by one using native code or the GPU. I personally had to make a choice back then whether to back the pure-JS API or the Web Audio API and I chose to commit to WAA simply because it had folks who were as concerned about minimizing latency and glitches as I was/am. 

As for what might've been a *viable* product, what I might've tried first is to take the SuperCollider server, which is pretty efficient, and build a JS front end for it. You might've gotten plenty native nodes to keep you busy and enough programming flexibility, thanks to JS.

-Kumar

[1] "Taming the ScriptProcessorNode"  http://sriku.org/blog/2013/01/30/taming-the-scriptprocessornode/

On 20 Oct, 2013, at 10:35 PM, Jussi Kalliokoski <jussi.kalliokoski@gmail.com> wrote:

> I'm a bit late for the party here, but... I'm sure no one is surprised that I very very much agree with Sebastien here. In fact I had probably been in the working group for just a few weeks before starting to complain about the (to me) completely backwards way of building this API. In my book the Web Audio API has become yet another warning example in the wall of shame for The Extensible Web Manifesto, where what people needed was the minimum viable product, i.e. easy way to write sound, any sound, to the users' speakers, and they needed it direly. Instead they got more than two years of waiting (depending on how you count of course, you could say that the Webkit implementation more shortly, but you could also say that we haven't reached v1 yet) just to get a monolithic framework that's hard to extend to their needs.
> 
> I've given the API my fair share of shots, trying to use it for both games and music, for cases where in theory the basic building blocks provided by the API should be enough (with a few hacks like looped "noise"), for example jet engine simulation as well as woodwind simulation. Every time eventually I've had to give up due to some limitation in the framework (such as circular routing not being possible without a 128 sample delay in the loop) or the some of the nodes themselves. Then, given the current limitations of the ScriptProcessorNode, if I implemented just the missing parts with ScriptProcessorNodes, I would end up being in a worse place in terms of performance than if I had gone with just doing everything in a single ScriptProcessorNode, due to reasons already mentioned by Sebastien.
> 
> We were also hitting the same issues at ofmlabs. In fact, in the discussions I've had with my colleagues even outside ofmlabs, anyone who has been in longer term contact with the API shares the frustration (or maybe they're just nice to me when I'm ranting :).
> 
> All this said, I'm sure most, if not all, of us here more or less see the issues now and I'm glad we're moving to first fix the gaping awful holes in the API for v1, and for v2 move on to what we should have started with: making the ScriptProcessorNode not just an escape hatch or a complement to the native nodes, but the core of the API on which to build on. Now, I know that hindsight is easy, but if we had started with just the ScriptProcessorNode two years ago and started getting developers to build around it, then optimize and build on the patterns they form, we wouldn't (despite the hard work of our editors) still have a massive backlog of ugly issues like shared memory and other things that prevent implementation in JS or similar languages.
> 
> My most sincere hope is something good has come out of all this in the form of us learning to stay away from prematurely optimized kitchen sink APIs and start with the basics in the future.
> 
> All the best,
> Jussi
> 
> 
> On Sat, Oct 19, 2013 at 8:14 PM, s p <sebpiq@gmail.com> wrote:
> > To the extent that it is a problem today, it's partly because present-day implementations are running the JS in these nodes in the main thread.
> 
> Let's suppose ScriptProcessorNode is optimized and runs in the audio thread? (Therefore minimizing IPC). And let's suppose 2 benchmarks, which for me summarize the important questions.
> 
> 1)
> Test1.1 is N ScriptProcessorNodes, each of them running an algorithm A. Test1.2 is one ScriptProcessorNode running N times the algorithm A.
> 
> 2)
> You have a really big graph. Test2.1 connects together native AudioNodes and/or ScriptProcessorNodes. Test2.2 implements the exact same dsp graph as a highly optimized dsp function using asm.js, running in a single ScriptProcessorNode.
> 
> In 1) do you think it is possible to bring the execution time of Test1 close to the execution time of Test2 by improving ScriptProcessorNode?
> In 2) do you think the Test2.1 will always be faster than Test2.2?
> 
> In fact ... the Test2 could already be done! I should try ...
> 
> 
> 
> 2013/10/19 Joseph Berkovitz <joe@noteflight.com>
> 
> On Oct 19, 2013, at 12:02 PM, s p <sebpiq@gmail.com> wrote:
>> 
>> And no matter if there is more nodes in the futures, there is just no way all the basic building blocks for all the algorithms humans can ever conceive can be provided as AudioNodes (and that sucks. Because on basically every other plateform, there is no limitation).
> 
> Of course AudioNodes can't be provided for everything. That is why extensibility is important, and ScriptProcessorNode is at present the vehicle for doing so.
> 
>> Second, if you understand that professionals need things that can't be built with basic AudioNodes, you understand that ScriptProcessorNode will be more than just an escape valve.
> 
> "Escape valve" was an understatement on my part. I completely agree that ScriptProcessorNode is essential to any professional, wide-ranging use of the API.
> 
>> Now the big problem with that is : you will need to instantiate multiple ScriptProcessorNodes in your graph, connect them with native AudioNodes, and because of the sum of the overheads of using ScriptProcessorNodes, you will end-up in a situation where it is actually more performant to just put the whole dsp function into ONE single ScriptProcessorNode, re-implementing oscillators, convolutions, and the whole thing ... making native AudioNodes useless. That's what I mean by "this architecture is impossible to extend".
> 
> I don't think your analysis is correct about ScriptProcessorNodes *for all time*. To the extent that it is a problem today, it's partly because present-day implementations are running the JS in these nodes in the main thread. This can impose inter-thread communication overhead that is highly implementation-dependent. To address this issue does not (to my mind) mean changing the entire direction of the Web Audio API. It means the overhead of ScriptProcessorNodes -- or whatever succeeds them in later API versions -- must be minimized through various means.
> 
> The WG has received similar feedback regarding ScriptProcessorNodes from other parties as well including internal W3C reviewers. These reviewers have not concluded that AudioNodes are "useless"; rather, they have requested that Web Audio address its present shortcomings and made some positive proposals on how to do so.
> 
>> 
> 
> .            .       .    .  . ...Joe
> 
> Joe Berkovitz
> President
> 
> Noteflight LLC
> Boston, Mass. phone: +1 978 314 6271
> www.noteflight.com
> "Your music, everywhere"
> 
> 
> 
> 
> -- 
> Sébastien Piquemal
> 
>  ----- @sebpiq
>  ----- http://github.com/sebpiq
>  ----- http://funktion.fm
>
Received on Monday, 21 October 2013 06:01:33 UTC