Re: DSP API proposal/draft from Matthew Paradis on 2012-07-24 (public-audio@w3.org from July to September 2012)

From: Matthew Paradis <matthew.paradis@bbc.co.uk>
Date: Tue, 24 Jul 2012 13:36:40 +0100
To: Marcus Geelnard <mage@opera.com>, Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
CC: <public-audio@w3.org>
Message-ID: <CC345568.D355%matthew.paradis@bbc.co.uk>
Some high level feedback...

This fills in a lot of the holes in the web audio api for more advanced
developers.  The majority of users may not be interested in these functions
- but they are the building blocks for more complex synthesis and signal
processing.  In an ideal world I would like to see this functionality
included in the web audio api and not developed separately.  Having multiple
apis that provide the building blocks for sound generation and processing is
messy and confusing (in my opinion).

Best

Matt


On 24/07/2012 10:00, "Marcus Geelnard" <mage@opera.com> wrote:

> Den 2012-07-23 22:42:32 skrev Jussi Kalliokoski
> <jussi.kalliokoski@gmail.com>:
> 
>> On Mon, Jul 23, 2012 at 6:06 PM, Marcus Geelnard <mage@opera.com> wrote:
> 
>>> - I wanted to keep the interface minimal at this point.
>>> 
>>> I think that many methods would work quite well with complex data, as
>>> long
>>> as you separate the real and imaginary parts into two different arrays
>>> and
>>> do two passes (e.g. for filtering, addition, etc). You can go from
>>> complex
>>> to polar with abs(real, imag) & atan2(imag, real) if you wish, and back
>>> again using cos(angle)*mag & sin(angle)*mag. Also, the FFT method
>>> supports
>>> complex input/output.
>>> 
>> 
>> Yes, this is good... Which brings me to my next question actually: is
>> there
>> a specific reason you chose the FFT to work with deinterleaved arrays?
>> Like
>> I mentioned, it seems likely that most of the typed arrays in the web
>> platform will be interleaved and all related that I can think of (except
>> for Web Audio API) already are: Canvas ImageData, WebGL vectors/matrices,
>> image formats in general and audio formats in general (there are a few
>> oddities there though).
> 
> Well, I looked at both interleaved and non-interleaved, and I'm not really
> in strong favor of either of them. I'm currently looking for a strong
> argument for going either way.
> 
> For instance, most C++ FFT libraries use interleaved data, which is only
> natural because the language supports array of structs / complex types
> natively, so for the FFT interface it might be better to use interleaved
> data (not really sure yet what the penalty for non-interleaved data is,
> though).
> 
> On the other hand, from an API perspective, interleaved data easily
> becomes awkward. For instance, how would you differentiate between a
> complex and a real-valued array? With non-interleaved data its quite
> clear: either you have an imaginary component (=complex) or you don't
> (=real).
> 
> One of the reasons for going with non-interleaved data was that the audio
> API uses non-interleaved data (for multi-channel buffers). Though a bit
> unrelated, it made me lean towards non-interleaved.
> 
> 
>>> Are there any specific methods that you feel are missing?
>> 
>> 
>> Aside from the existing methods working better with interleaved data and
>> adding (de)interleaving methods, having imag multiply and friends would
>> be
>> great, for example a quick vocoder (totally made up function signatures):
>> 
>> DSP.mul(inputPCM, inputPCM, windowFunc)
>> DSP.mul(carrierPCM, carrierPCM, windowFunc)
>> 
>> fft.forward(inputFFT, inputPCM)
>> fft.forward(carrierFFT, carrierPCM)
>> 
>> DSP.mul_imag(inputFFT, inputFFT, carrierPCM)
>> 
>> fft.inverse(outputPCM, inputFFT)
>> 
>> Overlap-add that and robotic voices here I come. ^^
> 
> I suppose by imag multiply, you mean complex multiply ;)
> 
> True, complex multiplication is very useful, especially for FFT-based
> filtering (such as your example). Though, I suppose that your example
> could have been implemented using convolution instead?
> 
> I added complex multiplication and division to the draft.
> 
> 
>>     - Add methods for deinterleaving/interleaving data (might be useful
>>> anyway).
>>> 
>> 
>> Yes, this is a good idea! Do you have any method signatures in mind?
>>> Perhaps it would be a good idea to let these "data swizzling" operators
>>> support other kinds of typed arrays too (e.g. to go from Uint8Array with
>>> interleaved RGBA data to four Float32Array's)?
>> 
>> 
>> I'm not sure how valuable that would be (if we don't allow cross-type
>> operations everywhere, that is O.o), especially since common data types
>> have so different presentations in different types. As an example, I
>> wouldn't know if that conversion from Uint8 to Float32 made the values
>> 0.0-1.0 (like WebGL) or -1.0 - 1.0 (like audio) or just kept it at 0.0 -
>> 255.0. I'd be even more clueless about the reverse operation, and quite
>> frankly be surprised of any outcome, heh.
> 
> Here, I think that the easiest solution is the best solution ;)
> 
> We can just use plain 1:1 mapping, and rely on WEBIDL/TypedArray format
> conversion rules. Any scaling (e.g. *32767 for going to Int16) can be done
> manually before interleaving or after deinterleaving.
> 
> I added interleave() and deinterleave() to the draft, though I used a
> maximum of four components, since that's what most SIMD instruction sets
> support well anyway (and most data of interest has <= 4 components), and
> with the offset & stride arguments you can always pack/unpack more
> components in several passes.
> 
> 
>>> Well, filtering, convolution, FFT and interpolation are all 1D. If we
>>> could agree on some common way to interpret 2D matrices in terms of
>>> typed
>>> arrays, we could add 2D versions of these methods (similar to the Matlab
>>> methods [1], [2] and [3], for instance).
>>> 
>> 
>> There's some heavy precedence already, like I said WebGL matrices and
>> Canvas ImageData are all interleaved and just in series and the matrix
>> shape is stored elsewhere, and I hardly think that it's going to change
>> any
>> time soon. So my suggestion is expanding the signature of the methods,
>> for
>> example FFT:
>> 
>> FFT (uint size0, uint size1, uint size2, ... uint sizeX)
>> 
>> This would already provide all that you need for running FFTs on the
>> existing data types on the web, for example for an imagedata:
>> 
>> var fft = new FFT(imgdata.width, imgdata.height)
>> fft.forward(imgdata.data)
> 
> Yes, extending the FFT interface for doing 2D would be quite easy.
> 
> IIR filtering, as you said, does not make much sense in 2D (well, it's
> probably useful, but you'd have to define a filtering direction, and it's
> not at all as commonly used as 2D FIR filters anyway). Also, the notion of
> continuous intra-buffer operation is not as commonly used for 2D signals
> (you typically have bounded images).
> 
> I think a new interface would be needed for 2D convolution. 3D (and
> higher?) convolution MIGHT be of interest, but then I think we're stepping
> into the realm of non-realtime applications, and you might actually get
> away with current JS performance.
> 
> For 2D interpolation, I think we have to come up with actual use-cases,
> because there are quite a few different ways of doing interpolation
> (uniform, non-uniform along x/y independently, non-uniform with x/y
> specified per output-element,...).
> 
>> Or a WebGL 3D matrix:
>> 
>> var fft = new FFT(matrixWidth, matrixHeight, matrixDepth)
>> fft.forward(matrix)
> 
> I think that 3D FFT (or higher order FFT:s) is of quite limited use, but I
> could be wrong. For complicated data analysis, yes, for WebGL, hmmm... I
> have a hard time imagining someone wanting to do the FFT of a 3D texture,
> for instance, especially in real time. That's typically something you'd do
> off line I guess (if at all).
> 
> All in all, I'm starting to think: What real time operations could you do
> with 2D DSP functions that you can't do with WebGL/GLSL? Filtering and
> interpolation can easily be done in GLSL, e.g. for implementing funky
> video effects for your camera app.
> 
> I think we need to gather more use cases for >= 2D applications to be able
> to design proper interfaces. The only use cases that I can think of right
> now that can't be done in WebGL but still needs to have real time
> performance are:
> 
> 1) Real time camera input analysis (face recognition, feature detection,
> ..?)
> 2) Real time video decoding of codecs that are not natively supported.
> 
> I'd say that 2) isn't really a practical solution, but more of a tech
> demo. 1) on the other hand might be useful for things like RTC, augmented
> reality, etc, but I have no clue what the critical operations or interface
> requirements for them would be.
> 
> /Marcus
> 

-- 
Matthew Paradis
Systems Engineer, Internet Research and Future Services.
BBC Research and Development, Central Labs, 7th Floor, Henry Wood House,
3-6 Langham Place, London W1B 3DF.
020 776 51243 | matthew.paradis@bbc.co.uk | bbc.co.uk/rd


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Tuesday, 24 July 2012 12:38:03 UTC