Re: Added direct JavaScript processing to WebKit prototype from Chris Rogers on 2010-08-07 (public-xg-audio@w3.org from August 2010)

From: Chris Rogers <crogers@google.com>
Date: Sat, 7 Aug 2010 14:10:01 -0700
To: moallem@mit.edu
Cc: Chris Marrin <cmarrin@apple.com>, public-xg-audio@w3.org, Corban Brook <corbanbrook@gmail.com>
Message-ID: <AANLkTime8cz6i_oLi0+qccqnJb+dENXrXe4pb=bsLVt4@mail.gmail.com>
When discussing native versus JS performance, FFTs are just one convenient
place to make comparisons.  But, of course, there are many other types of
audio processing algorithms not involving FFTs or spectral analysis at all,
each with different performance characteristics.  You gave an example of
using low-order filters instead of HRTF convolution for spatialization.
 This is a real-world trade-off which can be made for rendering quality
versus speed, and one which is offered, for example, in Apple's OpenAL
implementation.

There will certainly be many applications where pure JS processing will be
acceptable.  Several factors are important to consider when comparing with
native processing:
*
*
*Hardware Scalability*
As far as I know, we haven't measured CPU performance with pure JS
processing on phone and tablet devices yet, but we should get some numbers
here.  Roughly speaking though, we should not expect these devices to have
as much headroom as desktop class machines.

*Graphics and other Game Logic (Physics)*
For games which are doing a significant amount of canvas or WebGL drawing
and which may be doing other non-trivial things in JS such as running
physics engines, there will be significantly less JS resources left for
audio processing.  Conversely, doing any significant audio processing will
cause dropped frames in the graphics rendering, producing less smooth
animations for drawing on the page as compared with native audio processing.

*Latency*
*Does the application have low-latency requirements?  For some applications
the time delay between when a key is pressed or a mouse event is processed
and a sound is heard is important.  For audio-intensive applications, it
will likely be necessary to increase buffering to avoid audible glitches.
 This will increase latency.  Here's an interesting link talking about Flash
10 audio latency:*
http://joeberkovitz.com/blog/2008/10/15/controlling-audio-latency-in-flash-10/
This is interesting because Flash uses ActionScript which is very similar to
JavaScript.  Native audio processing can achieve the very best possible
latency and will not suffer from these problems.
*
*
*High-level API*
*The canvas drawing API has high-level functions for drawing things such as
circles, lines, and gradients.  Although it's possible to use an ImageData
and poke pixels directly into the bitmap to achieve the same effect, it
seems like the higher-level drawing APIs are very useful to have.
 Similarly, in audio there are fundamental operations which are very common
(mixing, filtering, panning, delay and other linear effects, etc.) and
providing direct and simple APIs for these common operations also seems very
useful to me.*

==============================================================================
*Some Use Cases:*

Here are two examples where I think native processing would be attractive:

*3D Games*
For games, implementing a room effect (algorithmic reverberation or
convolution), and spatializing multiple moving sources (whether with
convolution or other techniques), sound cones, distance effects, occlusion
and obstruction effects per-source is costly in terms of CPU usage.  Even
the very most basic algorithmic reverberation effects in pure JS are very
expensive and are not of the same quality or versatility as native
convolution effects.  Also, latency is often important in these types of
applications.

*DAW Applications*
DAW (digital-audio workstation) type applications, when you start to
consider multiple audio sources, with multiple insert effects on each
source, send effects, submixes, parameter automation with de-zippering, etc.
then the computational demands can become fairly significant.


Here are two examples where I think direct JS processing would be
attractive:

*Custom DSP Effects*
Unusual and interesting custom audio processing can be done directly in JS.
 It's also a good test-bed for prototyping new algorithms. This is an
extremely rich area.

*Educational Applications*
JS processing is ideal for illustrating concepts in computer music synthesis
and processing, such as Corban's example of the de-composition of a square
wave into its harmonic components, FM synthesis techniques, etc.


==============================================================================
*Hybrid Approaches*
Because of the modular nature of the AudioNode approach, it's easily
possible to combine both native processing and direct JS processing.  Some
applications may benefit from using both approaches at the same time and can
get the best of both worlds.

Best Regards,
Chris




On Sat, Aug 7, 2010 at 8:37 AM, Ted Moallem <ted.moallem@gmail.com> wrote:

> We are discussing FFT analysis as though it were the one solution for
> all of our spectral processing needs.  The algorithm is convenient
> when the "overhead" is tolerable, but considering the variety of
> web-capable platforms out there, the future of web audio might best be
> served by steering developers toward more efficient signal processing
> schemes, tailored to task requirements.  For example, does 3D
> spatialization require hundreds of large-sized FFT's per second, or
> can it be accomplished using low-order Butterworth filters with
> time-varying coefficients?  (If anyone knows the answer, please feel
> free to chime in.)  In any case, rather than considering whether or
> not javascript can handle the requirements of a modern-day first
> person shooter game (or peaceful bird-watcher game), I regard
> javascript audio API as a challenge to work within reasonable
> constraints where possible, to meet our web audio needs without
> monopolizing all of a devices resources?
>
> [descends from soapbox]
>
> -ted
>
>
>
>
> On Sat, Aug 7, 2010 at 3:25 AM, Chris Rogers <crogers@google.com> wrote:
> >> > Very nice. Overhead on my machine is very low (20%) and I think at
> least
> >> > half that overhead is WebGL rendering. It would be nice to duplicate
> the
> >> > functionality of the Realtime Ananyzer demo so we can understand the
> >> > difference in overhead between doing FFT's in JS vs native code.
> >> >
> >> > -----
> >> > ~Chris
> >> > cmarrin@apple.com
> >> >
> >> > Sure, I can do that.  I know that the Mozilla folks have already done
> >> > this and found the JS FFT performance to be acceptable for realtime
> >> > analysis.  Where the FFT overhead gets quite a lot heavier is in
> >> > panning/spatialization and convolution where there are hundreds of
> larger
> >> > sized FFTs per second.
> >>
> >> It would be useful to have an apples-to-apples comparison. What sample
> >> rate does the Realtime Analyzer demo use? It would be nice to do a test
> of
> >> 48KHz stereo, just to see how much we can stress it in JS.
> >>
> >> -----
> >> ~Chris
> >> cmarrin@apple.com
> >
> > It normally runs at 44.1KHz.  I think they were getting results like
> 0.4ms
> > per size 1024 FFT, which is not very heavy for doing a fairly standard
> > real-time analysis.  When I get some time, I can try to see what results
> we
> > get for JSC and V8 in WebKit.  I imagine we'll see something similar.
> > Chris
> >
>
>
>
> --
> ________________________________
>
> Theodore Moallem
> moallem@mit.edu
> 646-872-0283
>
> Sensory Communication Group
> Research Laboratory of Electronics at MIT
> Graduate Program in Speech & Hearing Bioscience and Technology
> Harvard-MIT Division of Health Sciences and Technology
>
>
> --
>
> "My instincts always tend to revolt against helplessness of any kind."   --
> NK
>
Received on Saturday, 7 August 2010 21:10:48 UTC