Re: DSP API proposal/draft from Jussi Kalliokoski on 2012-07-23 (public-audio@w3.org from July to September 2012)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Mon, 23 Jul 2012 23:42:32 +0300
To: Marcus Geelnard <mage@opera.com>
Cc: public-audio@w3.org
Message-ID: <CAJhzemUxrszKoMhXJ7mtbLBgrnhyuxf9Cer2PAHw33V8XjVdig@mail.gmail.com>
On Mon, Jul 23, 2012 at 6:06 PM, Marcus Geelnard <mage@opera.com> wrote:

> Yes, I have not focused very much on complex data this far. This mostly
> has to do with:
>
> - Audio signals are usually real valued, and I wanted to cover those use
> cases first.
>

Fair enough. :)


> - There's no natural representation of complex values in ECMAScript/Typed
> Arrays (e.g. we have to choose between interleaved arrays or multiple
> arrays).
>

I think ES6 Structs will come in handy with this. Which means that when you
make an ArrayType of them, I assume (bad assumptions, bad!) they'll be in
an interleaved ArrayBuffer, so in my opinion it makes most sense if we
assume vectors / matrices are interleaved.


> - I wanted to keep the interface minimal at this point.
>
> I think that many methods would work quite well with complex data, as long
> as you separate the real and imaginary parts into two different arrays and
> do two passes (e.g. for filtering, addition, etc). You can go from complex
> to polar with abs(real, imag) & atan2(imag, real) if you wish, and back
> again using cos(angle)*mag & sin(angle)*mag. Also, the FFT method supports
> complex input/output.
>

Yes, this is good... Which brings me to my next question actually: is there
a specific reason you chose the FFT to work with deinterleaved arrays? Like
I mentioned, it seems likely that most of the typed arrays in the web
platform will be interleaved and all related that I can think of (except
for Web Audio API) already are: Canvas ImageData, WebGL vectors/matrices,
image formats in general and audio formats in general (there are a few
oddities there though).


> Are there any specific methods that you feel are missing?


Aside from the existing methods working better with interleaved data and
adding (de)interleaving methods, having imag multiply and friends would be
great, for example a quick vocoder (totally made up function signatures):

DSP.mul(inputPCM, inputPCM, windowFunc)
DSP.mul(carrierPCM, carrierPCM, windowFunc)

fft.forward(inputFFT, inputPCM)
fft.forward(carrierFFT, carrierPCM)

DSP.mul_imag(inputFFT, inputFFT, carrierPCM)

fft.inverse(outputPCM, inputFFT)

Overlap-add that and robotic voices here I come. ^^


I've thought about adding stride/offset/etc versions of all methods, but I
> fear that it would make the API complexity much higher (also from an
> implementation point of view).


It's a valid concern, but not having them limits the usefulness of the API
quite radically, imho it would be a better idea to have only the
stride/offset versions than not to have them at all. The API consumer can
just specify stride to 1 and offset to zero when they aren't needed. But
yes, implementation wise it still complicates it a bit.

    - Add methods for deinterleaving/interleaving data (might be useful
> anyway).
>

Yes, this is a good idea! Do you have any method signatures in mind?
> Perhaps it would be a good idea to let these "data swizzling" operators
> support other kinds of typed arrays too (e.g. to go from Uint8Array with
> interleaved RGBA data to four Float32Array's)?


I'm not sure how valuable that would be (if we don't allow cross-type
operations everywhere, that is O.o), especially since common data types
have so different presentations in different types. As an example, I
wouldn't know if that conversion from Uint8 to Float32 made the values
0.0-1.0 (like WebGL) or -1.0 - 1.0 (like audio) or just kept it at 0.0 -
255.0. I'd be even more clueless about the reverse operation, and quite
frankly be surprised of any outcome, heh.

As far as function signatures go one option is the infinite argument
signature (no need to specify length!):

deinterleave <T>(T src, T dst0, T dst1, T dst2, ..., T dstX)
interleave <T> (T dst, T src0, T src1, T src2, ..., T srcX)

That'd mean that if you had an array of deinterleaved typed arrays you
could do it like this:

DSP.interleave.apply(null, [destinationArray].concat(arrayOfTypedArrays))


>
>   * Since convolution is attached to the filter interface, it's also
>> limited
>> to 1D signals, hence not very useful for images, etc.
>>
>
> Well, filtering, convolution, FFT and interpolation are all 1D. If we
> could agree on some common way to interpret 2D matrices in terms of typed
> arrays, we could add 2D versions of these methods (similar to the Matlab
> methods [1], [2] and [3], for instance).
>

There's some heavy precedence already, like I said WebGL matrices and
Canvas ImageData are all interleaved and just in series and the matrix
shape is stored elsewhere, and I hardly think that it's going to change any
time soon. So my suggestion is expanding the signature of the methods, for
example FFT:

FFT (uint size0, uint size1, uint size2, ... uint sizeX)

This would already provide all that you need for running FFTs on the
existing data types on the web, for example for an imagedata:

var fft = new FFT(imgdata.width, imgdata.height)
fft.forward(imgdata.data)

Or a WebGL 3D matrix:

var fft = new FFT(matrixWidth, matrixHeight, matrixDepth)
fft.forward(matrix)

Cheers,
Jussi
Received on Monday, 23 July 2012 20:43:00 UTC