Re: New proposal for fixing race conditions from Marcus Geelnard on 2013-07-23 (public-audio@w3.org from July to September 2013)

From: Marcus Geelnard <mage@opera.com>
Date: Tue, 23 Jul 2013 23:52:59 +0200
To: Chris Wilson <cwilso@google.com>
Cc: Ehsan Akhgari <ehsan.akhgari@gmail.com>, "Robert O'Callahan" <robert@ocallahan.org>, Jer Noble <jer.noble@apple.com>, Russell McClellan <russell@motu.com>, WG <public-audio@w3.org>
Message-ID: <CAL8YEv6w3SQaowdfc=4rrpPBHha-SqqUUnpo-u+Z4TmNtCB+aA@mail.gmail.com>
On Tue, Jul 23, 2013 at 10:10 PM, Chris Wilson <cwilso@google.com> wrote:

> On Tue, Jul 23, 2013 at 11:00 AM, Marcus Geelnard <mage@opera.com> wrote:
>
>> If you're talking about pre-rendering sound into an AudioBuffer (in a way
>> that can't be done using an OfflineAudioContext), I doubt that memcpy will
>> do much harm. Again (if this is the case), could you please provide an
>> exanple?
>>
>
> OK.  I want to load an audio file, perform some custom analysis on it
> (e.g. determine average volume), perform some custom (offline) processing
> on the buffer based on that analysis (e.g. soft limiting), and then play
> the resulting buffer.
>
> If I understand it, under ROC's original proposal, this would result in
> the the entire buffer being copied one extra time (other than the initial
> AudioBuffer creation by decodeAudioData), under Jer's recent proposal I
> would have to copy it twice.  "I doubt that memcpy will do much harm" is a
> bit of an odd statement in favor of - as you yourself said, I don't think
> that "it's usually not a problem" is a strong enough argument.  I don't see
> the inherent raciness as a shortcoming we have to paper over; this isn't a
> design flaw, it's a memory-efficient design.  The audio system should have
> efficient access to audio buffers, and it needs to function in a decoupled
> way in order to provide glitch-free audio when at all possible.
>
>
Ok, so here's my view of it: For audio processing there are very few
situations where memcpy is a performance bottleneck.

1) The speed / time issue (I think we agree here already, but here are some
raw numbers anyway if anyone is still in doubt).

I did a quick test on my desktop computer, and I can memcpy 60 seconds of
48Ksamples/s stereo audio (float32) in 1 ms. If memory serves me right my
Tegra 2 phone will take about 30 times that, so less than 50 ms in any
event. I strongly doubt that it would be even noticeable for your use case.

For reference, a simple normalization loop (i.e. find max amplitude + scale
all samples with 1 / maxAmplitude) over 60 seconds of sound takes >500 ms
on my desktop computer in Chromium.

In other words 2x memcpy of the buffer would amount to < 0.4% of the total
processing time, and that's for a very trivial operation (for more complex
processing, the memcpy would be even less noticeable).

2) The doubling of the memory issue.

I'm trying to come up with a worst case scenario here, but I think that the
most reasonable situation is that you do something like this:

a) Load & decode sound into an audio buffer (1x memory).
b) Copy the audio buffer into float32 typed arrays (2x memory).
c) Process the data (still 2x memory).
d) Copy the arrays back to the audio buffer (still 2x memory).
e) Drop the reference to the typed arrays -> GC (back to 1x memory).

...and then you repeat this process for every sound you wish to process. In
other words, you'll likely only have at most 1x memory + the memory of the
last/current buffer being processed, which should amount to *less* than 2x
the total memory used for audio buffers. In "steady state" (after
processing is done), you'll be back to 1x the memory, so it's a temporary
memory peak.

True, this *might* be a problem, but if you're creating an app that even
comes close to using 50% of your available memory for audio buffers alone,
I'd say that you're in deep trouble anyway (I think it will be very hard to
make such an app work cross-platform).

In fact, here's another thought: With Jer's proposal an implementation is
no longer forced to using flat Float32 arrays internally, meaning that it
would be possible for an implementation to use various forms of compression
techniques for the AudioBuffers. For instance, on a low memory device you
could use 16-bit integers instead of 32-bit floats for the audio buffers,
which would *save* memory compared to the current design (which prevents
these kinds of memory optimizations).

I'm not saying that the latter is the way to go, but my experience with
development for mobile devices is that it's quite nice to have the option
to sacrifice quality for performance at times. E.g. using 16-bit graphics
instead of 32-bit graphics can be a valid choice, if it gives you 2x the
rendering performance and saves you 50% of the memory, especially on a
device that can't display all the 16M colors anyway. The same could go for
audio.

Oh, and with regards to the "it's usually not a problem" analogy - point
taken. However, from my point of view we're talking about a few corner
cases where we'll see a slight (temporary) increase in memory consumption
(in the normal use case, there'll be no difference) vs breaking the way the
Web works (as has been explained in various ways by different people on
this list).

I guess this is what is dividing the group into two camps right now.
Personally, I'd much rather go with the potential memory increase than
breaking typed arrays (i.e. I'm in the "let's keep the Web webby" camp).

/Marcus
Received on Tuesday, 23 July 2013 21:53:28 UTC