- From: Marcus Geelnard <mage@opera.com>
- Date: Wed, 14 Nov 2012 14:28:57 +0100
- To: "Jussi Kalliokoski" <jussi.kalliokoski@gmail.com>, "Jens Nockert" <jens@nockert.se>
- Cc: public-audio@w3.org
- Message-ID: <op.wnrmej0dm77heq@mage-speeddemon>
Hi Jens!
Sorry for the late reply...
Den 2012-11-06 11:21:30 skrev Jens Nockert <jens@nockert.se>:
>
> On 6 Nov 2012, at 08:48, Marcus Geelnard <mage@opera.com> wrote:
>
>> That might be something worth pursuing. At some point the API only
>> supported >>the source-is-destination paradigm, and I encountered the
>> inv/neg problem too. >>Are you suggesting that there should be another
>> signature, (dest/src1, src2), >>in addition to the current (dest, src1,
>> src2) signature, or should we drop the >>latter from the API?
>
> The idea me and Jussi discussed was to allow for both, but disallow (or
> make undefined) overlap in all cases where dst is >not exactly the same
> view as either src1, src2 or src3.
Ok, so just more signatures. Essentially convenience alternatives, such
that for instance DSP.sub(a, b) could be implemented as:
DSP.sub(a, b) = function {
DSP.sub(a, a, b);
};
I don't mind that, but on the other hand I didn't consider it important to
do those things in the first iteration of the spec (simplicity tends to
make things easier).
BTW, I'm not a big fan of undefined behavior in this context (i.e.
something living on the Web), though I recognize that it might save a few
cycles for really short arrays. I think it's better to disallow
overlapping operations.
>
>> Keep in mind, there are a few disadvantages of the
>> source-is-destination >>paradigm:
>>
>> 1) Order matters, as you pointed out. May require additional operations
>> to >>cover all use-cases.
>
> Yes, but these operations are probably already useful. Neg and inv could
> be useful operations, but they could also wait, >they can easily be
> implemented in terms of other operations.
True - but let's wait... Inv could be a valuable addition from a
performance perspective, though.
>
>> 2) In many situations, you need to do an explicit memory-copy
>> operation, for >>instance:
>>
>> a = b - c becomes:
>>
>> a.set(b);
>> DSP.sub(a, c);
>
> Since I think 3-operand versions still should be allowed, the more
> problematic situation if you do not allow any overlapping >arguments at
> all is
>
> a = b - a
I think that could be allowed as DSP.sub(a, b, a). I don't see any
practical implementation that would not allow that, right? And I think
that the spec should allow for trivial operations (i.e. independent
per-element operations) to use one or more of the source arguments as the
destination argument. So for instance, DSP.pow(a,a,a) should be allowed
imo.
>
> but that can be solved with the madd operation
>
> DSP.madd(a, -1.0f, b) /* a = (a * -1.0f) + b, */
>
> if we also allow for scalars to replace a Float32Array in all arguments,
> instead of only the last argument.
>
> I think this is a good idea, and my SM implementation of the DSP part of
> the API already supports it, the cost in extra >complexity in my code is
> zero, but in an optimised implementation it would require some extra
> branches probably.
There's probably a limit where branching becomes impractical for short
arrays. I tried to keep that at a minimum by keeping the number of
possible signatures to a minimum (essentially, every signature requires an
extra branch), and of course more signatures mean more individually
optimized loops too.
>
> Division is a bit more complex,
>
> a = b / a
>
> since a * (1.0f / b) is not the same as a / b.
>
>> 3) Some operations (Filter & FFT) can't be done in-place, so you'd
>> either have >>to use a different paradigm for those operations (and
>> solve the overlapping->>problem anyway - possibly using exceptions), or
>> use internal destination >>buffers and do an extra memory-copy
>> operation.
>
> I think people want to do these in-place, but then the implementation
> should have internal buffers and do the extra copy >that is sometimes
> required automatically. But I think the FFT interface may grow a lot
> more complex with time (see FFTW for >example), so we might want to keep
> it quite minimal in an initial version.
We could allow for in-place operation, but out-of-place should give better
performance so I'd prefer good support for it.
>
>> My idea with the current spec would be to disallow overlapping
>> operation, and >>for some operations support source & destination to be
>> the same. If violated, >>throw an exception.
>
> Checking and throwing an exception could be expensive for very short
> vectors, couldn't the order of evaluation just be >undefined? Then most
> weird types of overlap would be disallowed, and if the implementations
> use threads, then it will be >'enforced' at runtime from the beginning.
I think the easiest way (for now) is to disallow overlap. You'd typically
have to do two range checks (upper/lower) for every source argument
(checking against the destination argument). If I'm not mistaking, I think
you should be able to optimize for the correct case (no overlap) an make
sure that you get good branch prediction for that case (close to zero
cycles overhead).
>
>> We also have to consider the case where the the second source argument
>> (if >>any) overlaps with the destination argument, so for instance in
>> DSP.sub(a, b, >>c), a, b, and c may all be overlapping (e.g. thanks to
>> TypedArray.subarray).
>
> Or DSP.madd, which is even worse
>
> -- Jens Nockert
--
Marcus Geelnard
Core graphics developer
Opera Software ASA
Received on Wednesday, 14 November 2012 13:29:38 UTC