Re: Proposal for fixing race conditions from Marcus Geelnard on 2013-07-17 (public-audio@w3.org from July to September 2013)

From: Marcus Geelnard <mage@opera.com>
Date: Wed, 17 Jul 2013 23:28:43 +0200
To: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Cc: Jer Noble <jer.noble@apple.com>, "robert@ocallahan.org" <robert@ocallahan.org>, "K. Gadd" <kg@luminance.org>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, WG <public-audio@w3.org>
Message-ID: <CAL8YEv4X2bn3f2iYQTc4Xt2+ciGcAieMoj81SgEc0HO5hE_2Bg@mail.gmail.com>
Just a few comments:

0) First, let me re-iterate that I think that it's unacceptable for us to
move forward with a specification that allows for "shared mutable state
without locks" (as Jens Nockert so concisely put it). I really think that
we have to (and should be able to) find a solution to this.

1) memcpy is really, really fast on any modern CPU architecture (you'll
find it's *the* most optimized routine, both in software and in hardware).
Having hand optimized graphics rasterization loops in assembler for ARM
I've learned that it's impossible to get even close to its speed even when
only doing trivial stuff, such as adding a constant value to a buffer or so.

My conclusion is that *in general*, the performance hit of a memcpy will be
dwarfed by any other audio processing that is being done by the audio
engine (true - there are corner cases where you could have to copy really
large buffers, but I'm talking *in general*).

So, I think that it's important to show *real* performance issues with
memcpy, rather than just saying that it's a performance hit every time we
have to do a memcpy.

2) There are examples of other high performance APIs that rely heavily on
memory copying into "immutable" objects, for the sake of enabling otherwise
impossible performance optimizations. Both OpenAL and OpenGL use this
technique. In the case of OpenAL, an audio buffer is filled with data using
an explicit copy operation. In the case of OpenGL, graphics data are
"uploaded" into texture objects. Developers are aware of the cost of the
upload/copy operation, but it enables super fast performance (hardware
accelerated and/or optimized internal data format) once the data has been
committed.

In OpenGL, It's also possible to modify sub-regions of a large texture
object by uploading only a sub-area of a texture. I could imagine a similar
solution in the Web Audio API, e.g. for editing small regions of a 5 minute
audio clip. For instance, the AudioBuffer interface could be extended with
a method along the lines of:

void updateChannelData(unsigned long channel, Float32Array data,
unsigned long srcOffset, unsigned long length, unsigned long
dstOffset);

...that would make the necessary modification of the sub-region in the
AudioBuffer.

3) The very concept of non-shared buffer data opens up for some very
interesting possibilities. For instance, it would be possible for an
implementation to convert the Float32 data to a more optimal internal
representation (e.g. fixed point), which could give a tremendous
performance boost on some devices.

So, I personally haven't come up with any *real* problems with immutable
objects that can't be solved in some way. I'm quite confident that it's the
right way forward, especially given the proven track record of other
performance enabling APIs that are designed around the principle of
non-shared buffer data.

With that said, I'm not opposed to a solution that partially relies on
neutering, as long as it solves the data race issue, but I think that an
immutable buffer solution would be even neater.

Regards,

  Marcus






On Wed, Jul 17, 2013 at 7:04 PM, Ehsan Akhgari <ehsan.akhgari@gmail.com>wrote:

> On Wed, Jul 17, 2013 at 11:57 AM, Jer Noble <jer.noble@apple.com> wrote:
>
>> That is also a crucial point which I believe is getting lost in this
>> conversation.  These performance concerns are only about how efficiently we
>> can run legacy code while maintaining the current API.  If we're willing to
>> break API backwards compatibility by making AudioBuffers immutable for
>> example, and avoid using Float32Arrays anywhere except as an argument to
>> the AudioBuffer constructor, then we'll avoid all of the memcpy costs in
>> the normal usage of the API, except when creating an AudioBuffer.  So, it's
>> not like that our two alternatives are not fixing the race conditions at
>> all, or memcpy data all the time on every call into the API.
>>
>>
>> Generally that is true, but "except when creating an AudioBuffer" is a
>> very large caveat. There are some use cases (e.g., generated audio
>> playback) which would hit this edge case.
>>
>
> The only way to avoid the memcpy when creating an AudioBuffer would be to
> neuter the argument passed to it, but that will make the Float32Array
> passed to it unusable from that point on, which may or may not be
> acceptable.  We can also design an asynchronous AudioBuffer creation API so
> that the memcpy can be implemented off the main thread (like how
> decodeAudioData currently works.)
>
> We need to come to terms with the fact that there is no free lunch.  We
> just cannot design an API which never memcpy's anything and is free from
> data race conditions.  Instead of discussing how terrible memcpy's are, we
> need to discuss how we can avoid doing them in the use cases which we
> expect to be common, or how to diminish their performance impact if they're
> impossible to avoid.  This is where having practical performance sensitive
> test cases which examines those scenarios come in handy.
>
> --
> Ehsan
> <http://ehsanakhgari.org/>
>
Received on Wednesday, 17 July 2013 21:29:12 UTC