- From: Marcus Geelnard <mage@opera.com>
- Date: Wed, 14 Nov 2012 14:28:57 +0100
- To: "Jussi Kalliokoski" <jussi.kalliokoski@gmail.com>, "Jens Nockert" <jens@nockert.se>
- Cc: public-audio@w3.org
- Message-ID: <op.wnrmej0dm77heq@mage-speeddemon>
Hi Jens! Sorry for the late reply... Den 2012-11-06 11:21:30 skrev Jens Nockert <jens@nockert.se>: > > On 6 Nov 2012, at 08:48, Marcus Geelnard <mage@opera.com> wrote: > >> That might be something worth pursuing. At some point the API only >> supported >>the source-is-destination paradigm, and I encountered the >> inv/neg problem too. >>Are you suggesting that there should be another >> signature, (dest/src1, src2), >>in addition to the current (dest, src1, >> src2) signature, or should we drop the >>latter from the API? > > The idea me and Jussi discussed was to allow for both, but disallow (or > make undefined) overlap in all cases where dst is >not exactly the same > view as either src1, src2 or src3. Ok, so just more signatures. Essentially convenience alternatives, such that for instance DSP.sub(a, b) could be implemented as: DSP.sub(a, b) = function { DSP.sub(a, a, b); }; I don't mind that, but on the other hand I didn't consider it important to do those things in the first iteration of the spec (simplicity tends to make things easier). BTW, I'm not a big fan of undefined behavior in this context (i.e. something living on the Web), though I recognize that it might save a few cycles for really short arrays. I think it's better to disallow overlapping operations. > >> Keep in mind, there are a few disadvantages of the >> source-is-destination >>paradigm: >> >> 1) Order matters, as you pointed out. May require additional operations >> to >>cover all use-cases. > > Yes, but these operations are probably already useful. Neg and inv could > be useful operations, but they could also wait, >they can easily be > implemented in terms of other operations. True - but let's wait... Inv could be a valuable addition from a performance perspective, though. > >> 2) In many situations, you need to do an explicit memory-copy >> operation, for >>instance: >> >> a = b - c becomes: >> >> a.set(b); >> DSP.sub(a, c); > > Since I think 3-operand versions still should be allowed, the more > problematic situation if you do not allow any overlapping >arguments at > all is > > a = b - a I think that could be allowed as DSP.sub(a, b, a). I don't see any practical implementation that would not allow that, right? And I think that the spec should allow for trivial operations (i.e. independent per-element operations) to use one or more of the source arguments as the destination argument. So for instance, DSP.pow(a,a,a) should be allowed imo. > > but that can be solved with the madd operation > > DSP.madd(a, -1.0f, b) /* a = (a * -1.0f) + b, */ > > if we also allow for scalars to replace a Float32Array in all arguments, > instead of only the last argument. > > I think this is a good idea, and my SM implementation of the DSP part of > the API already supports it, the cost in extra >complexity in my code is > zero, but in an optimised implementation it would require some extra > branches probably. There's probably a limit where branching becomes impractical for short arrays. I tried to keep that at a minimum by keeping the number of possible signatures to a minimum (essentially, every signature requires an extra branch), and of course more signatures mean more individually optimized loops too. > > Division is a bit more complex, > > a = b / a > > since a * (1.0f / b) is not the same as a / b. > >> 3) Some operations (Filter & FFT) can't be done in-place, so you'd >> either have >>to use a different paradigm for those operations (and >> solve the overlapping->>problem anyway - possibly using exceptions), or >> use internal destination >>buffers and do an extra memory-copy >> operation. > > I think people want to do these in-place, but then the implementation > should have internal buffers and do the extra copy >that is sometimes > required automatically. But I think the FFT interface may grow a lot > more complex with time (see FFTW for >example), so we might want to keep > it quite minimal in an initial version. We could allow for in-place operation, but out-of-place should give better performance so I'd prefer good support for it. > >> My idea with the current spec would be to disallow overlapping >> operation, and >>for some operations support source & destination to be >> the same. If violated, >>throw an exception. > > Checking and throwing an exception could be expensive for very short > vectors, couldn't the order of evaluation just be >undefined? Then most > weird types of overlap would be disallowed, and if the implementations > use threads, then it will be >'enforced' at runtime from the beginning. I think the easiest way (for now) is to disallow overlap. You'd typically have to do two range checks (upper/lower) for every source argument (checking against the destination argument). If I'm not mistaking, I think you should be able to optimize for the correct case (no overlap) an make sure that you get good branch prediction for that case (close to zero cycles overhead). > >> We also have to consider the case where the the second source argument >> (if >>any) overlaps with the destination argument, so for instance in >> DSP.sub(a, b, >>c), a, b, and c may all be overlapping (e.g. thanks to >> TypedArray.subarray). > > Or DSP.madd, which is even worse > > -- Jens Nockert -- Marcus Geelnard Core graphics developer Opera Software ASA
Received on Wednesday, 14 November 2012 13:29:38 UTC