Re: DSP API overlapping arrays + inv() and neg() from Marcus Geelnard on 2012-11-14 (public-audio@w3.org from October to December 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Wed, 14 Nov 2012 14:28:57 +0100
To: "Jussi Kalliokoski" <jussi.kalliokoski@gmail.com>, "Jens Nockert" <jens@nockert.se>
Cc: public-audio@w3.org
Message-ID: <op.wnrmej0dm77heq@mage-speeddemon>
Hi Jens!

Sorry for the late reply...


Den 2012-11-06 11:21:30 skrev Jens Nockert <jens@nockert.se>:

>
> On 6 Nov 2012, at 08:48, Marcus Geelnard <mage@opera.com> wrote:
>
>> That might be something worth pursuing. At some point the API only  
>> supported >>the source-is-destination paradigm, and I encountered the  
>> inv/neg problem too. >>Are you suggesting that there should be another  
>> signature, (dest/src1, src2), >>in addition to the current (dest, src1,  
>> src2) signature, or should we drop the >>latter from the API?
>
> The idea me and Jussi discussed was to allow for both, but disallow (or  
> make undefined) overlap in all cases where dst is >not exactly the same  
> view as either src1, src2 or src3.

Ok, so just more signatures. Essentially convenience alternatives, such  
that for instance DSP.sub(a, b) could be implemented as:

DSP.sub(a, b) = function {
   DSP.sub(a, a, b);
};

I don't mind that, but on the other hand I didn't consider it important to  
do those things in the first iteration of the spec (simplicity tends to  
make things easier).

BTW, I'm not a big fan of undefined behavior in this context (i.e.  
something living on the Web), though I recognize that it might save a few  
cycles for really short arrays. I think it's better to disallow  
overlapping operations.


>
>> Keep in mind, there are a few disadvantages of the  
>> source-is-destination >>paradigm:
>>
>> 1) Order matters, as you pointed out. May require additional operations  
>> to >>cover all use-cases.
>
> Yes, but these operations are probably already useful. Neg and inv could  
> be useful operations, but they could also wait, >they can easily be  
> implemented in terms of other operations.

True - but let's wait... Inv could be a valuable addition from a  
performance perspective, though.

>
>> 2) In many situations, you need to do an explicit memory-copy  
>> operation, for >>instance:
>>
>>  a = b - c   becomes:
>>
>>  a.set(b);
>>  DSP.sub(a, c);
>
> Since I think 3-operand versions still should be allowed, the more  
> problematic situation if you do not allow any overlapping >arguments at  
> all is
>
>  a = b - a

I think that could be allowed as DSP.sub(a, b, a). I don't see any  
practical implementation that would not allow that, right? And I think  
that the spec should allow for trivial operations (i.e. independent  
per-element operations) to use one or more of the source arguments as the  
destination argument. So for instance, DSP.pow(a,a,a) should be allowed  
imo.


>
> but that can be solved with the madd operation
>
> DSP.madd(a, -1.0f, b) /* a = (a * -1.0f) + b, */
>
> if we also allow for scalars to replace a Float32Array in all arguments,  
> instead of only the last argument.
>
> I think this is a good idea, and my SM implementation of the DSP part of  
> the API already supports it, the cost in extra >complexity in my code is  
> zero, but in an optimised implementation it would require some extra  
> branches probably.

There's probably a limit where branching becomes impractical for short  
arrays. I tried to keep that at a minimum by keeping the number of  
possible signatures to a minimum (essentially, every signature requires an  
extra branch), and of course more signatures mean more individually  
optimized loops too.


>
> Division is a bit more complex,
>
> a = b / a
>
> since a * (1.0f / b) is not the same as a / b.
>
>> 3) Some operations (Filter & FFT) can't be done in-place, so you'd  
>> either have >>to use a different paradigm for those operations (and  
>> solve the overlapping->>problem anyway - possibly using exceptions), or  
>> use internal destination >>buffers and do an extra memory-copy  
>> operation.
>
> I think people want to do these in-place, but then the implementation  
> should have internal buffers and do the extra copy >that is sometimes  
> required automatically. But I think the FFT interface may grow a lot  
> more complex with time (see FFTW for >example), so we might want to keep  
> it quite minimal in an initial version.

We could allow for in-place operation, but out-of-place should give better  
performance so I'd prefer good support for it.


>
>> My idea with the current spec would be to disallow overlapping  
>> operation, and >>for some operations support source & destination to be  
>> the same. If violated, >>throw an exception.
>
> Checking and throwing an exception could be expensive for very short  
> vectors, couldn't the order of evaluation just be >undefined? Then most  
> weird types of overlap would be disallowed, and if the implementations  
> use threads, then it will be >'enforced' at runtime from the beginning.

I think the easiest way (for now) is to disallow overlap. You'd typically  
have to do two range checks (upper/lower) for every source argument  
(checking against the destination argument). If I'm not mistaking, I think  
you should be able to optimize for the correct case (no overlap) an make  
sure that you get good branch prediction for that case (close to zero  
cycles overhead).


>
>> We also have to consider the case where the the second source argument  
>> (if >>any) overlaps with the destination argument, so for instance in  
>> DSP.sub(a, b, >>c), a, b, and c may all be overlapping (e.g. thanks to  
>> TypedArray.subarray).
>
> Or DSP.madd, which is even worse
>
> -- Jens Nockert



-- 
Marcus Geelnard
Core graphics developer
Opera Software ASA
Received on Wednesday, 14 November 2012 13:29:38 UTC