# Re: Simplifying specing/testing/implementation work

From: Marcus Geelnard <mage@opera.com>
Date: Thu, 19 Jul 2012 21:03:38 +0000
Message-ID: <20120719210338.0lheeius96h8oc0w@staff.opera.com>
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>

```Citerar Jussi Kalliokoski <jussi.kalliokoski@gmail.com>:

> Obviously SIMD code is faster than addition in JS now, for example. And
> yes, IIR filter is a type of a convolution, but I don't think it's possible
> to write an efficient IIR filter algorithm using a convolution engine --
> after all, a convolution engine should be designed to deal with a FIRs.

You can split the IIR filter into two parts: an FIR part and a
recursive part. If you skip the recursive part, you get plain
convolution.

> Not to mention that common IIR filters have 4 (LP, HP, BP, N) kernels,

I was thinking more along the lines of using the more generic form of
the filter function (see the Matlab filter() function [1]), which
supports everything from first order IIR filters to FIR-filters with
1000s of taps (or more, obviously).

> And as far as IIR filter performance goes, I think SIMD
> instructions offer very little usefulness in IIR algorithms, since they're
> so linear.

Actually, I made a simple benchmark on my AMD x2, 64-bit Linux, using
an IIR filter function identical to the filter function in the

JavaScript code:

function filter (y, x, b, a) {
var b0 = b[0], b1 = b[1], b2 = b[2];
var a1 = a[0], a2 = a[2];
var len = x.length;
for (var k = 2; k < len; ++k)
y[k] = b0 * x[k] + b1 * x[k-1] + b2 * x[k-2] - a1 * y[k-1] - a2 * y[k-2];
}

C++ code:

void filter(float* y, const float* x, unsigned len, const float* b,
const float* a)
{
float b0 = b[0], b1 = b[1], b2 = b[2];
float a1 = a[0], a2 = a[2];

// FIR part
for (unsigned k = 2; k < len; ++k)
y[k] = b0 * x[k] + b1 * x[k-1] + b2 * x[k-2];

// Recursive part
for (unsigned k = 2; k < len; ++k)
y[k] -= a1 * y[k-1] + a2 * y[k-2];
}

Note: In the C++ version I split the FIR and recursive parts into two
separate loops because somehow g++ managed to optimize that better
than doing a single loop (looking at the assembler output, it actually
SIMD:ified the FIR part for me - impressive!). Also, it illustrates
that you can do the FIR part first, with optimizations (e.g. FFT-based
convolution or SIMD), and then append the recursive part as a second
pass.

Benchmark results (using 128 samples long buffers):

C++: 529.4 Ms/s
Chrome:  22.0 Ms/s
FF:  13.0 Ms/s
Opera:   9.0 Ms/s

So, we actually get a >20x speedup when going from the fastest JS
engine to native C++ code (g++ -O3 -ffast-math). I think that's
motivation enough to implement it as a native function ;)

Regards,

Marcus

[1] http://www.mathworks.se/help/techdoc/ref/filter.html

--
Marcus Geelnard
Core Graphics Developer
Opera Software ASA
```
Received on Thursday, 19 July 2012 21:12:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:50:01 UTC