# Re: Simplifying specing/testing/implementation work

From: Marcus Geelnard <mage@opera.com>
Date: Thu, 19 Jul 2012 21:03:38 +0000
Message-ID: <20120719210338.0lheeius96h8oc0w@staff.opera.com>
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>

```Citerar Jussi Kalliokoski <jussi.kalliokoski@gmail.com>:

> Obviously SIMD code is faster than addition in JS now, for example. And
> yes, IIR filter is a type of a convolution, but I don't think it's possible
> to write an efficient IIR filter algorithm using a convolution engine --
> after all, a convolution engine should be designed to deal with a FIRs.

You can split the IIR filter into two parts: an FIR part and a
recursive part. If you skip the recursive part, you get plain
convolution.

> Not to mention that common IIR filters have 4 (LP, HP, BP, N) kernels,

I was thinking more along the lines of using the more generic form of
the filter function (see the Matlab filter() function ), which
supports everything from first order IIR filters to FIR-filters with
1000s of taps (or more, obviously).

> And as far as IIR filter performance goes, I think SIMD
> instructions offer very little usefulness in IIR algorithms, since they're
> so linear.

Actually, I made a simple benchmark on my AMD x2, 64-bit Linux, using
an IIR filter function identical to the filter function in the
BiquadFilterNode (excluding buffer edge handling).

JavaScript code:

function filter (y, x, b, a) {
var b0 = b, b1 = b, b2 = b;
var a1 = a, a2 = a;
var len = x.length;
for (var k = 2; k < len; ++k)
y[k] = b0 * x[k] + b1 * x[k-1] + b2 * x[k-2] - a1 * y[k-1] - a2 * y[k-2];
}

C++ code:

void filter(float* y, const float* x, unsigned len, const float* b,
const float* a)
{
float b0 = b, b1 = b, b2 = b;
float a1 = a, a2 = a;

// FIR part
for (unsigned k = 2; k < len; ++k)
y[k] = b0 * x[k] + b1 * x[k-1] + b2 * x[k-2];

// Recursive part
for (unsigned k = 2; k < len; ++k)
y[k] -= a1 * y[k-1] + a2 * y[k-2];
}

Note: In the C++ version I split the FIR and recursive parts into two
separate loops because somehow g++ managed to optimize that better
than doing a single loop (looking at the assembler output, it actually
SIMD:ified the FIR part for me - impressive!). Also, it illustrates
that you can do the FIR part first, with optimizations (e.g. FFT-based
convolution or SIMD), and then append the recursive part as a second
pass.

Benchmark results (using 128 samples long buffers):

C++: 529.4 Ms/s
Chrome:  22.0 Ms/s
FF:  13.0 Ms/s
Opera:   9.0 Ms/s

So, we actually get a >20x speedup when going from the fastest JS
engine to native C++ code (g++ -O3 -ffast-math). I think that's
motivation enough to implement it as a native function ;)

Regards,

Marcus

 http://www.mathworks.se/help/techdoc/ref/filter.html

--
Marcus Geelnard
Core Graphics Developer
Opera Software ASA
```
Received on Thursday, 19 July 2012 21:12:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:50:01 UTC