Re: DSP API proposal/draft from Marcus Geelnard on 2012-07-24 (public-audio@w3.org from July to September 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Tue, 24 Jul 2012 11:00:08 +0200
To: "Jussi Kalliokoski" <jussi.kalliokoski@gmail.com>
Cc: public-audio@w3.org
Message-ID: <op.whx0mi0vm77heq@mage-desktop>
Den 2012-07-23 22:42:32 skrev Jussi Kalliokoski  
<jussi.kalliokoski@gmail.com>:

> On Mon, Jul 23, 2012 at 6:06 PM, Marcus Geelnard <mage@opera.com> wrote:

>> - I wanted to keep the interface minimal at this point.
>>
>> I think that many methods would work quite well with complex data, as  
>> long
>> as you separate the real and imaginary parts into two different arrays  
>> and
>> do two passes (e.g. for filtering, addition, etc). You can go from  
>> complex
>> to polar with abs(real, imag) & atan2(imag, real) if you wish, and back
>> again using cos(angle)*mag & sin(angle)*mag. Also, the FFT method  
>> supports
>> complex input/output.
>>
>
> Yes, this is good... Which brings me to my next question actually: is  
> there
> a specific reason you chose the FFT to work with deinterleaved arrays?  
> Like
> I mentioned, it seems likely that most of the typed arrays in the web
> platform will be interleaved and all related that I can think of (except
> for Web Audio API) already are: Canvas ImageData, WebGL vectors/matrices,
> image formats in general and audio formats in general (there are a few
> oddities there though).

Well, I looked at both interleaved and non-interleaved, and I'm not really  
in strong favor of either of them. I'm currently looking for a strong  
argument for going either way.

For instance, most C++ FFT libraries use interleaved data, which is only  
natural because the language supports array of structs / complex types  
natively, so for the FFT interface it might be better to use interleaved  
data (not really sure yet what the penalty for non-interleaved data is,  
though).

On the other hand, from an API perspective, interleaved data easily  
becomes awkward. For instance, how would you differentiate between a  
complex and a real-valued array? With non-interleaved data its quite  
clear: either you have an imaginary component (=complex) or you don't  
(=real).

One of the reasons for going with non-interleaved data was that the audio  
API uses non-interleaved data (for multi-channel buffers). Though a bit  
unrelated, it made me lean towards non-interleaved.


>> Are there any specific methods that you feel are missing?
>
>
> Aside from the existing methods working better with interleaved data and
> adding (de)interleaving methods, having imag multiply and friends would  
> be
> great, for example a quick vocoder (totally made up function signatures):
>
> DSP.mul(inputPCM, inputPCM, windowFunc)
> DSP.mul(carrierPCM, carrierPCM, windowFunc)
>
> fft.forward(inputFFT, inputPCM)
> fft.forward(carrierFFT, carrierPCM)
>
> DSP.mul_imag(inputFFT, inputFFT, carrierPCM)
>
> fft.inverse(outputPCM, inputFFT)
>
> Overlap-add that and robotic voices here I come. ^^

I suppose by imag multiply, you mean complex multiply ;)

True, complex multiplication is very useful, especially for FFT-based  
filtering (such as your example). Though, I suppose that your example  
could have been implemented using convolution instead?

I added complex multiplication and division to the draft.


>     - Add methods for deinterleaving/interleaving data (might be useful
>> anyway).
>>
>
> Yes, this is a good idea! Do you have any method signatures in mind?
>> Perhaps it would be a good idea to let these "data swizzling" operators
>> support other kinds of typed arrays too (e.g. to go from Uint8Array with
>> interleaved RGBA data to four Float32Array's)?
>
>
> I'm not sure how valuable that would be (if we don't allow cross-type
> operations everywhere, that is O.o), especially since common data types
> have so different presentations in different types. As an example, I
> wouldn't know if that conversion from Uint8 to Float32 made the values
> 0.0-1.0 (like WebGL) or -1.0 - 1.0 (like audio) or just kept it at 0.0 -
> 255.0. I'd be even more clueless about the reverse operation, and quite
> frankly be surprised of any outcome, heh.

Here, I think that the easiest solution is the best solution ;)

We can just use plain 1:1 mapping, and rely on WEBIDL/TypedArray format  
conversion rules. Any scaling (e.g. *32767 for going to Int16) can be done  
manually before interleaving or after deinterleaving.

I added interleave() and deinterleave() to the draft, though I used a  
maximum of four components, since that's what most SIMD instruction sets  
support well anyway (and most data of interest has <= 4 components), and  
with the offset & stride arguments you can always pack/unpack more  
components in several passes.


>> Well, filtering, convolution, FFT and interpolation are all 1D. If we
>> could agree on some common way to interpret 2D matrices in terms of  
>> typed
>> arrays, we could add 2D versions of these methods (similar to the Matlab
>> methods [1], [2] and [3], for instance).
>>
>
> There's some heavy precedence already, like I said WebGL matrices and
> Canvas ImageData are all interleaved and just in series and the matrix
> shape is stored elsewhere, and I hardly think that it's going to change  
> any
> time soon. So my suggestion is expanding the signature of the methods,  
> for
> example FFT:
>
> FFT (uint size0, uint size1, uint size2, ... uint sizeX)
>
> This would already provide all that you need for running FFTs on the
> existing data types on the web, for example for an imagedata:
>
> var fft = new FFT(imgdata.width, imgdata.height)
> fft.forward(imgdata.data)

Yes, extending the FFT interface for doing 2D would be quite easy.

IIR filtering, as you said, does not make much sense in 2D (well, it's  
probably useful, but you'd have to define a filtering direction, and it's  
not at all as commonly used as 2D FIR filters anyway). Also, the notion of  
continuous intra-buffer operation is not as commonly used for 2D signals  
(you typically have bounded images).

I think a new interface would be needed for 2D convolution. 3D (and  
higher?) convolution MIGHT be of interest, but then I think we're stepping  
into the realm of non-realtime applications, and you might actually get  
away with current JS performance.

For 2D interpolation, I think we have to come up with actual use-cases,  
because there are quite a few different ways of doing interpolation  
(uniform, non-uniform along x/y independently, non-uniform with x/y  
specified per output-element,...).

> Or a WebGL 3D matrix:
>
> var fft = new FFT(matrixWidth, matrixHeight, matrixDepth)
> fft.forward(matrix)

I think that 3D FFT (or higher order FFT:s) is of quite limited use, but I  
could be wrong. For complicated data analysis, yes, for WebGL, hmmm... I  
have a hard time imagining someone wanting to do the FFT of a 3D texture,  
for instance, especially in real time. That's typically something you'd do  
off line I guess (if at all).

All in all, I'm starting to think: What real time operations could you do  
with 2D DSP functions that you can't do with WebGL/GLSL? Filtering and  
interpolation can easily be done in GLSL, e.g. for implementing funky  
video effects for your camera app.

I think we need to gather more use cases for >= 2D applications to be able  
to design proper interfaces. The only use cases that I can think of right  
now that can't be done in WebGL but still needs to have real time  
performance are:

1) Real time camera input analysis (face recognition, feature detection,  
..?)
2) Real time video decoding of codecs that are not natively supported.

I'd say that 2) isn't really a practical solution, but more of a tech  
demo. 1) on the other hand might be useful for things like RTC, augmented  
reality, etc, but I have no clue what the critical operations or interface  
requirements for them would be.

/Marcus


-- 
Marcus Geelnard
Core Graphics Developer
Opera Software ASA
Received on Tuesday, 24 July 2012 09:00:51 UTC