Re: Questioning the current direction of the Web Audio API

On 21 Oct, 2013, at 12:15 PM, s p <sebpiq@gmail.com> wrote:

> > ScriptProcessorNode is perfectly equivalent functionally to what Firefox provided back then .. and we're still complaining about it
> 
> But if there was only this piece functionality to work on, it would probably be much cleaner and efficient than it is today.

I doubt it can be improved beyond where it stands without serious architectural change. "Performance" has two components - speed and latency. While we're certainly there with speed (on desktop devices at least), we're architecturally crippled today when it comes to latency if we do it all in JS ... and need to do something other than audio as well. Bear in mind that we need buffer durations to be < 350 samples for low enough latency even for games. For musical instruments, the demand is even stricter. The current native components can deliver that, while the JS pipeline cannot.

> > The current broken-ness of the ScriptProcessorNode [...] stems from the fact that running JS code [...] introduces intolerable communication latencies and potential for glitches for engaging sound.
> 
> Once again, that's a problem that could have been tackled if the specification was focusing on it rather than re-inventing the wheel. 

You're right, but I believe this is more a social challenge than an engineering one (given the number of different sub-systems and the kind of security issues involved) and I have no idea how much time it would've taken. In all likelihood, we might be here today without even low latency sample triggering for games if that route had been taken.

> Guys from RjDj (http://rjdj.me/) have compiled big Pure Data patches to a single dsp function using emscripten and asm.js, and their benchmarks shown that they have **same performance in a ScriptProcessorNode than Pure Data on the desktop** which to me is quite impressive.

Absolutely true. JS speed is rocking. However, it needs to be called *in time* which has been a problem ... especially with other stuff happening. For comparison, Linux has needed kernel patches to improve audio latency despite the client code all being written in "high performance" C.

There was some recent effort on making the ScriptProcessorNode run in 128-sample chunks with some promising results like desktop browsers being able to trigger thousands of events ... to which an objection raised was something along the lines of "we don't want audio to flood the main thread with events". While a valid objection, this is the "social challenge" I'm talking about - of audio fitting into the eco system. The current solution of native nodes is only partly for speed. The other part is so that they can do their job on another core if available.

> From my experience, I have used WebPd - which I haven't optimized at all - rather successfully in performances, without a single glitch with rather big patches (~ 200 objects) on crappy laptops.

Did it use canvas graphics too? I did a prototype called "wireup" a long time ago that also compiled graphs to JS (it could also do 1-sample delay feedback) and I was impressed by the speed. However, simple graphics operations would interfere with the audio necessitating longer buffer durations than 256.  Buffer durations needed to be 1024 or higher to prevent glitches. What buffer durations do you use?

-Kumar

> 
> 2013/10/21 Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
>> Then, given the current limitations of the ScriptProcessorNode, if I implemented just the missing parts with ScriptProcessorNodes, I would end up being in a worse place in terms of performance than if I had gone with just doing everything in a single ScriptProcessorNode, due to reasons already mentioned by Sebastien.
> 
> This is pretty much what Lonce and I concluded - if you want to use the ScriptProcessorNode, do everything in it, or use native nodes exclusively. While the latter approach is quite viable for applications like games (though perhaps not for truly experimental game audio), the former is only viable in situations where audio takes centre stage ... but at least both are *viable* to a good extent today. The ScriptProcessorNode is simply *not* something that can be used to emulate the other nodes.[1]
> 
> I would like to step back a bit and look at the "hindsight" aspect of this discussion.
> 
> Way back, we *did* have a "minimum viable product" specd more or less as you ask - "easy way to write sound, any sound, to the users' speakers". Before the Web Audio API's early incarnations even existed (iirc), Firefox already had such an experimental API available which was already getting some visibility. The problem was that it was "minimal", but not really "viable" for many applications. In fact, what we have today in the ScriptProcessorNode is perfectly equivalent functionally to what Firefox provided back then .. and we're still complaining about it. 
> 
> The current broken-ness of the ScriptProcessorNode is not entirely a failure of the WG. It stems from the fact that running JS code in any of the sandboxed environments available in the browser - the main thread or workers - introduces intolerable communication latencies and potential for glitches for engaging sound. In short, these sandboxed environment suck for complete programming flexibility. Will communication with workers improve? Perhaps. Should the WG push that? Maybe. Will it happen across the board on all devices? I Don't know. Would devices be able to run some efficient native audio components? Hell yes, they already do. The design flaw, from this perspective, looks like not enough native components.
> 
> The limitations of the pure JS approach reared its ugly head very early on and continue to persist till date. If we coded a game with that API, detected a game event and triggered a sound now, we hear it tomorrow, unless you're ok with audio breaking up. If we looked around at other flexible systems such as SuperCollider / MaxMSP / PD / CSound, we'd notice that all of them had gone through building components in native code for performance and plugging them into a glue language - with some more expressive and flexible than others. If we consider the emerging devices, JS performance (compute per watt) was expected to remain low on them for a while ... and is still relatively low on ARM devices *today* with no clear future other than accelerating various aspects one by one using native code or the GPU. I personally had to make a choice back then whether to back the pure-JS API or the Web Audio API and I chose to commit to WAA simply because it had folks who were as concerned about minimizing latency and glitches as I was/am. 
> 
> As for what might've been a *viable* product, what I might've tried first is to take the SuperCollider server, which is pretty efficient, and build a JS front end for it. You might've gotten plenty native nodes to keep you busy and enough programming flexibility, thanks to JS.
> 
> -Kumar
> 
> [1] "Taming the ScriptProcessorNode"  http://sriku.org/blog/2013/01/30/taming-the-scriptprocessornode/
> 
> On 20 Oct, 2013, at 10:35 PM, Jussi Kalliokoski <jussi.kalliokoski@gmail.com> wrote:
> 
>> I'm a bit late for the party here, but... I'm sure no one is surprised that I very very much agree with Sebastien here. In fact I had probably been in the working group for just a few weeks before starting to complain about the (to me) completely backwards way of building this API. In my book the Web Audio API has become yet another warning example in the wall of shame for The Extensible Web Manifesto, where what people needed was the minimum viable product, i.e. easy way to write sound, any sound, to the users' speakers, and they needed it direly. Instead they got more than two years of waiting (depending on how you count of course, you could say that the Webkit implementation more shortly, but you could also say that we haven't reached v1 yet) just to get a monolithic framework that's hard to extend to their needs.
>> 
>> I've given the API my fair share of shots, trying to use it for both games and music, for cases where in theory the basic building blocks provided by the API should be enough (with a few hacks like looped "noise"), for example jet engine simulation as well as woodwind simulation. Every time eventually I've had to give up due to some limitation in the framework (such as circular routing not being possible without a 128 sample delay in the loop) or the some of the nodes themselves. Then, given the current limitations of the ScriptProcessorNode, if I implemented just the missing parts with ScriptProcessorNodes, I would end up being in a worse place in terms of performance than if I had gone with just doing everything in a single ScriptProcessorNode, due to reasons already mentioned by Sebastien.
>> 
>> We were also hitting the same issues at ofmlabs. In fact, in the discussions I've had with my colleagues even outside ofmlabs, anyone who has been in longer term contact with the API shares the frustration (or maybe they're just nice to me when I'm ranting :).
>> 
>> All this said, I'm sure most, if not all, of us here more or less see the issues now and I'm glad we're moving to first fix the gaping awful holes in the API for v1, and for v2 move on to what we should have started with: making the ScriptProcessorNode not just an escape hatch or a complement to the native nodes, but the core of the API on which to build on. Now, I know that hindsight is easy, but if we had started with just the ScriptProcessorNode two years ago and started getting developers to build around it, then optimize and build on the patterns they form, we wouldn't (despite the hard work of our editors) still have a massive backlog of ugly issues like shared memory and other things that prevent implementation in JS or similar languages.
>> 
>> My most sincere hope is something good has come out of all this in the form of us learning to stay away from prematurely optimized kitchen sink APIs and start with the basics in the future.
>> 
>> All the best,
>> Jussi
>> 
>> 
>> On Sat, Oct 19, 2013 at 8:14 PM, s p <sebpiq@gmail.com> wrote:
>> > To the extent that it is a problem today, it's partly because present-day implementations are running the JS in these nodes in the main thread.
>> 
>> Let's suppose ScriptProcessorNode is optimized and runs in the audio thread? (Therefore minimizing IPC). And let's suppose 2 benchmarks, which for me summarize the important questions.
>> 
>> 1)
>> Test1.1 is N ScriptProcessorNodes, each of them running an algorithm A. Test1.2 is one ScriptProcessorNode running N times the algorithm A.
>> 
>> 2)
>> You have a really big graph. Test2.1 connects together native AudioNodes and/or ScriptProcessorNodes. Test2.2 implements the exact same dsp graph as a highly optimized dsp function using asm.js, running in a single ScriptProcessorNode.
>> 
>> In 1) do you think it is possible to bring the execution time of Test1 close to the execution time of Test2 by improving ScriptProcessorNode?
>> In 2) do you think the Test2.1 will always be faster than Test2.2?
>> 
>> In fact ... the Test2 could already be done! I should try ...
>> 
>> 
>> 
>> 2013/10/19 Joseph Berkovitz <joe@noteflight.com>
>> 
>> On Oct 19, 2013, at 12:02 PM, s p <sebpiq@gmail.com> wrote:
>>> 
>>> And no matter if there is more nodes in the futures, there is just no way all the basic building blocks for all the algorithms humans can ever conceive can be provided as AudioNodes (and that sucks. Because on basically every other plateform, there is no limitation).
>> 
>> Of course AudioNodes can't be provided for everything. That is why extensibility is important, and ScriptProcessorNode is at present the vehicle for doing so.
>> 
>>> Second, if you understand that professionals need things that can't be built with basic AudioNodes, you understand that ScriptProcessorNode will be more than just an escape valve.
>> 
>> "Escape valve" was an understatement on my part. I completely agree that ScriptProcessorNode is essential to any professional, wide-ranging use of the API.
>> 
>>> Now the big problem with that is : you will need to instantiate multiple ScriptProcessorNodes in your graph, connect them with native AudioNodes, and because of the sum of the overheads of using ScriptProcessorNodes, you will end-up in a situation where it is actually more performant to just put the whole dsp function into ONE single ScriptProcessorNode, re-implementing oscillators, convolutions, and the whole thing ... making native AudioNodes useless. That's what I mean by "this architecture is impossible to extend".
>> 
>> I don't think your analysis is correct about ScriptProcessorNodes *for all time*. To the extent that it is a problem today, it's partly because present-day implementations are running the JS in these nodes in the main thread. This can impose inter-thread communication overhead that is highly implementation-dependent. To address this issue does not (to my mind) mean changing the entire direction of the Web Audio API. It means the overhead of ScriptProcessorNodes -- or whatever succeeds them in later API versions -- must be minimized through various means.
>> 
>> The WG has received similar feedback regarding ScriptProcessorNodes from other parties as well including internal W3C reviewers. These reviewers have not concluded that AudioNodes are "useless"; rather, they have requested that Web Audio address its present shortcomings and made some positive proposals on how to do so.
>> 
>>> 
>> 
>> .            .       .    .  . ...Joe
>> 
>> Joe Berkovitz
>> President
>> 
>> Noteflight LLC
>> Boston, Mass. phone: +1 978 314 6271
>> www.noteflight.com
>> "Your music, everywhere"
>> 
>> 
>> 
>> 
>> -- 
>> Sébastien Piquemal
>> 
>>  ----- @sebpiq
>>  ----- http://github.com/sebpiq
>>  ----- http://funktion.fm
>> 
> 
> 
> 
> 
> -- 
> Sébastien Piquemal
> 
>  ----- @sebpiq
>  ----- http://github.com/sebpiq
>  ----- http://funktion.fm

Received on Monday, 21 October 2013 09:24:10 UTC