Re: DSP API proposal/draft from Jens Nockert on 2012-07-29 (public-audio@w3.org from July to September 2012)

From: Jens Nockert <jens@ofmlabs.org>
Date: Sun, 29 Jul 2012 23:50:36 +0200
To: public-audio@w3.org
Message-Id: <631E8AB4-00FB-43BF-BE87-284645D2EDA0@ofmlabs.org>
Hello,

Jussi Kalliokoski asked me to comment on the draft DSP specification, so I'll add some comments. It looks good in general, but I think it needs a bit of polish.

> I've put together a first (not yet complete) draft specification for a DSP  
> API. I would like you (the W3 WG) to have a look at it and come with both  
> high level feedback (is this a good idea to begin with?) and low level  
> feedback (see points below).

The main issues with the specification is that it rigidly specifies certain things, while leaves other things floating. It also rigidly specifies things in ways that seem to remove any chances of optimization on systems that need it the most.

It doesn't seem to allow for FMA (fused multiply-add, a * b + c with a single rounding) and hardware without support for subnormals for example, both of which could improve performance. It doesn't matter on x86(-64), but on ARM it does, since  the Neon unit does not support subnormals in hardware, and therefore forces ARM processors to implement the API in the slower non-SIMD VFP unit.

> o There are three interfaces: DSP, FFT, Filter

Wouldn't it make sense to split DSP up into DSP and Vector or something, parts of the DSP interface are quite generic (essentially a slightly weaker L1 BLAS, http://www.netlib.org/lapack/lug/node145.html)

> o Only 1D signal processing has been considered/included.

Could easily be extended at a later date, no point in convoluting the API in the beginning.

>   o Only Float32Array is supported.

For the current audio APIs, floats make sense, but 16-bit integers would be interesting in the long run.

>   o Points for further discussion:
>     - Interface naming (e.g. name space bloat and collisions)?
>     - FFT & Filter are only useful as objects - should they be  
> interface-less / only live on the DSP interface in some way (to minimize  
> global name space pollution)?

It makes sense to minimize pollution, FFT isn't likely to be wanted by anyone else, but Filter is quite generic. You could also pick the more generic DFT name, and not define which method the DFT is calculated with.

>     - I believe that 2D and 3D signal processing is currently out of scope  
> and risks bloating the API. Comments?

I agree, 2D and 3D are a lot more complex, 1D first.

>     - We could possibly support Float64Array too, but that'd roughly double  
> the complexity (code size, testing, etc), and I don't really see any real  
> use cases for it in audio. Comments?

The current API is minimal and simple to implement, except for FFT (and to some extent the Filters.) Adding support for double-precision would make it _slightly_ larger, but I doubt it would make much difference. The only problem would possibly be mixing types, where some nasty issues could arise if integer support ever was added.

> * The FFT interface
>   o This is probably the most obvious one, since it has been discussed  
> before and it's a key building block for lots of signal processing.
>   o Points for further discussion:
>     - The exact design of the interface - comments?

At some point later, there could be an interface to save execution plans. If any browser implements it in a way so that saving execution plans make sense (linking to FFTW etc?)

> * The Filter interface
>   o This is intended to implement both IIR and FIR filters of any order  
> (inspired by the Scilab/Matlab filter function [1]).
>   o It supports cross-block filtering state.
>   o Points for further discussion:
>     - Should we simplify the case of supporting filtering of several  
> channels with a single Filter object? E.g. have separate History objects  
> (one for each channel) that you can manage yourself - or is the
> current interface good enough (e.g. manage your own copies of history  
> arrays, or have one filter per channel)?

Having a single filter saves memory bandwidth, but just adding a stride to the parameter list, allows you to use it for multi-channel use anyhow without packing/unpacking.

> * The DSP interface - general
>   o Benchmarks show that for most methods, we currently get a significant  
> performance gain by using native methods even for simple operations, such  
> as add(), for instance.

Any implementation that uses SIMD and single-precision should be at least 2x as fast as JS even on the simplest operations, some of the more complex operations should be even faster.

>   o I anticipate that many of the methods in the DSP interface will be  
> possible to implement with native/close-to-native performance directly in  
> JS at some point in the future, but it's quite risky to make assumptions  
> about that (for Audio, we need stellar performance now).
>   o Points for further discussion:
>     - Is the interface bloated?
>     - Is it too restrictive?

Strides are important, since it saves a pack/unpack roundtrip. Otherwise the interface is reasonable. I would add a suffix for the functions that operate on complex numbers though, to minimize confusion.

> * The DSP interface - Math clones
>   o The Math object clones (sin, cos, sqrt, and friends) were all put on  
> the DSP object for symmetry, even though some are obviously more  
> useful/critical/important than others.

You could easily drop all of them if you want to make the interface smaller. It isn't currently orthogonal, since it doesn't support them for complex numbers, which would be slightly useful, but not worth the effort.

> * The DSP interface - Interpolation
>   o The sampleLinear() method uses non-uniform, edge-clamping sampling  
> (inspired by the Scilab/Matlab interp1 function [2]).
>   o Non-uniform sampling can be used for implementing things such as:
>     - Traditional uniform sampling.
>     - Sweeping playback rate sampling.
>     - Looped playback (supply a looping time parameter).
>     - Wave-shaper (i.e. interpolate the shape curve using the input signal).
>   o Only linear interpolation has been defined.
>   o Points for further discussion:
>     - We probably want higher order interpolation too (cubic, sinc).
>     - Lower order interpolation too? (nearest)
>     - Should we support more specialized sampling methods (e.g. uniform  
> sampling) too, to further improve performance?

You're quickly going to have a lot of interpolation methods that way, linear and Catmull-Rom is a reasonable choice. GLSL provides linear and cubic Hermite interpolation with the tangents set to zero iirc.

I also wrote some comments on most of the functions in the DSP API, https://gist.github.com/3199146, most of it is just minor.

-- Jens Nockert
Received on Monday, 30 July 2012 07:52:29 UTC