Re: Welcome! from Jens Nockert on 2013-11-23 (public-webarraymath@w3.org from November 2013)

From: Jens Nockert <jens@nockert.se>
Date: Sat, 23 Nov 2013 18:52:16 +0100
To: Marcus Geelnard <mage@opera.com>
Cc: Web Array Math <public-webarraymath@w3.org>
Message-Id: <67647095-26CE-42E5-B82A-AA40D4A8A543@nockert.se>
Hello everyone!

> Since this group was created quite recently, I do not want to rush to the formal parts just yet (scope of work, decision processes, communication preferences, selecting a chair, etc). Let's wait a bit until more people have had the chance to join etc.

Thank you Marcus for creating the group, I totally agree that the bureaucracy can wait a bit too.

But to get discussion started I’m going to comment a bit on my thoughts of the DSP API[1], and what I think are interesting use cases for ‘array math’ and how I think it could (and should) interact with other emerging technologies.

I know it changed name in the latest iterations, but I ‘grew up’ with it being called the DSP API so I am going to call it that for the remainder of this mail. For those of you who didn’t, I’m looking at the spec called 3 in the original mail.

> Real-time audio processing entails a few requirements that make it slightly more difficult than some other forms of data and signal processing. Especially important is low latency (typically microseconds rather than milliseconds), low CPU overhead, and for garbage collected languages such as JavaScript the GC activity must be minimal.

I think this is a use-case we must support, it is important for the long-term evolution of the Web Audio API, and WebGL and in the future WebCL. A lot of stuff we now try to push away from JS as much as possible, could, with a low-latency/high-performance way to do vector math be brought back to JS where it belongs.

It is also important if we want to be able to allow the ‘Web Array Math’ API inside of a ParallelArray/RiverTrail[2] context, so we can utilize that synergy to allow for new stuff we didn’t even consider when discussing this. This could be especially important for WebGL/WebCL where you might want to pre-process something in parallel before uploading to a coprocessor.

The last type of technologies that we’ll have to interact with is the ‘emscripten’-class of software, and how we can build the API so that it is accessible for that sort of software. I’m sure even a fast memcpy could help accelerate that kind of code, and if we can interact with asm.js in some meaningful way, that’s an advantage. (But for the record, I dislike the idea behind asm.js as much as I dislike NaCl)

> Considering the design of the Web Audio API, where the data is made available in a typed array on the JavaScript heap and come in bursts of a few hundred samples at a time, the most viable option is to do all the processing on the CPU. On the other hand, using WebGL/GLSL or other GPU-based APIs, such as WebCL, would quite likely fail to meet the latency requirements.

They would fail in general, WebCL can run on the CPU or a dedicated audio DSP, but that are not use-cases that Web Audio supports yet. You would probably need some WebCL-node or something, but I doubt that will be supported in a reasonable time frame by DSPs.

> ...which leaves us with the CPU load. While modern JavaScript engines are quite impressive, they usually fail to utilize the instruction level parallelism provided by SIMD instructions, which becomes the most important missing link for achieving performance levels in JavaScript on par with hand crafted native code (e.g. C++ with SIMD intrinsics).

And here we come to the magic SIMD word, which I think is key to the debate here. Every processor that we run JS on today supports SIMD, yet most don’t support SIMD in a way that JS can take advantage of. (ARMv7 doesn’t support double-precision for example)

I think the second most important thing (and something that the specs hack around all the time) is the lack of other numerical datatypes than double, which is also something that the DSP API battled with. This is also something we’ll have to work with, we probably cannot fix it once and for all, but we should keep it in mind.

I think the DSP API was amazing for what it was designed for, Audio DSP in the context of Web Audio, but once we expand the scope, parts of the design doesn’t make as much sense anymore. I’m mainly thinking about the section Numerical Accuracy (4), the methods related to complex numbers, the methods starting with `sample`, `sum`, `pack`, `unpack`. (Please tell me if you disagree)

When we don’t know what the user is trying to do anymore, numerical accuracy is paramount. Games for example could depend on IEEE 754 behavior for synchronized physics. IEEE 754 is designed to minimize the errors for people ignorant about numerics, and this is behavior I think we should try to keep. We don’t want the API to behave weirdly across browsers if we don’t know what it will be used for.

> Based on those conditions, I tried to come up with a fairly minimal API that would be easy to implement in a Web client, yet bring cross platform SIMD capabilities to the Web platform. The result after a few iterations was an API that I called the "DSP API", which later matured into what is now called the "Web Array Math API".

I think we have two real options.

The low-risk version is to write a version of the `ArrayMath` part of the DSP API that supports all the useful data-types that we want to support. This could probably be done in a short amount of time and be reasonably non-controversial.

The high-risk version is to write a short vector API (think raw SIMD) that you can use to build the API above easily. This is probably controversial and a bit more complex, since the interaction with the JS engines are at a much lower level. On the other hand, this would be the holy grail of JS performance.

A short vector API could be like ecmascript_simd[3] or the spec[4], with an API that is designed to map down to hardware _on multiple platforms_. ARM is winning one battle, x86 one and MIPS likely another.

There’s technically a third route that I don’t want to throw out of the window without telling you all about it. I’ll just call it the ‘NumPy/Matlab’ route, and would essentially mean to implement a similar API in JS. It is by far the hardest route, solves a different problem, but if we want a high-level API, you really can optimize the shit out of software like that. If we wanted to do actual high-performance numerics, this would be the obvious way to go, but I don’t think we actually want that.

> While the API was designed with audio signal processing in mind, it should of course be useful for other things too. For instance, you can check some early usage examples (non-accelerated when you use the JS polyfill).

Demos wins the world, I think it is important that we demo early, demo often.

> Now, after some proof-of-concept testing the time has come to involve more people and take the work forward. ...which is why I created this groups.
> 
> I suspect that the first things we'll try to tackle in this group (apart from practical & formal issues) are the high level aspects of the proposed API (such as its scope, its general design, use cases and its merits and flaws compared to other similar technologies), and provided that we reach some sort of consensus we'll move on to lower level aspects of the API (such as interface design, missing/superfluous methods, precision requirements, testability, etc).

I think my stream-of-conciousness covered a few of those points.

Hugs,
Jens Nockert

[1]: https://github.com/opera-mage/webarraymath
[2]: http://wiki.ecmascript.org/doku.php?id=strawman:data_parallelism
[3]: https://github.com/johnmccutchan/ecmascript_simd
[4]: http://wiki.ecmascript.org/doku.php?id=strawman:simd_number
Received on Saturday, 23 November 2013 17:52:58 UTC