Re: New proposal for fixing race conditions from Jussi Kalliokoski on 2013-07-27 (public-audio@w3.org from July to September 2013)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Sat, 27 Jul 2013 17:29:33 +0300
To: Chris Wilson <cwilso@google.com>
Cc: Marcus Geelnard <mage@opera.com>, Ehsan Akhgari <ehsan.akhgari@gmail.com>, "Robert O'Callahan" <robert@ocallahan.org>, Jer Noble <jer.noble@apple.com>, Russell McClellan <russell@motu.com>, WG <public-audio@w3.org>
Message-ID: <CAJhzemVMOkddf0pbN0L9Bnxt46PkysKK+-y4h4RKLk70tzGh9g@mail.gmail.com>

On Fri, Jul 26, 2013 at 7:23 PM, Chris Wilson <cwilso@google.com> wrote:

> On Wed, Jul 24, 2013 at 4:06 AM, Jussi Kalliokoski <
> jussi.kalliokoski@gmail.com> wrote:
>
>> On Tue, Jul 23, 2013 at 11:10 PM, Chris Wilson <cwilso@google.com> wrote:
>>
>>> OK.  I want to load an audio file, perform some custom analysis on it
>>> (e.g. determine average volume), perform some custom (offline) processing
>>> on the buffer based on that analysis (e.g. soft limiting), and then play
>>> the resulting buffer.
>>>
>> This is a symptom of another problem with the API. In this scenario your
>> biggest problem with the API is far from the copy happening here, instead
>> it is that the method for decoding audio has the wrong input and output for
>> most cases. What the decodeAudioData assumes currently is that what you
>> have is a binary buffer containing the encoded audio data and you want a
>> high-level construct representing the audio data (an AudioBuffer) out of
>> it. Your case (and a common case anyway), however, is that you have a URL
>> to an audio resource and you want a list of Float32Arrays out. Why does
>> decodeAudioData (async) return an AudioBuffer in the first place?
>>
>
> Hmm.  Well, the channels being handled as an array of Float32Arrays would
> be less structurally obvious.  Since the it is resampled to the
> AudioContext rate anyway, the meta data is less interesting - although I'd
> ideally like our decoding to have more metadata expressing the internals,
> rather than less (e.g. original sample rate, any tempo tags, etc.).
>

Let's keep our use cases in check here. decodeAudioData is completely
unsuitable for any scenario where you'd do anything with the  metadata. In
all of the scenarios I can think of where you'd want to use the metadata
you'd also want a completely more low-level construct, e.g. a streaming
decoder. decodeAudioData is designed for one-shot samples, convolution
kernels and oscillator waveforms, but then again for these use cases, it's
a too low-level construct (because you have to do the XHR manually). If
you're worried about performance and memory usage, decodeAudioData is
usable only for buffers of few seconds in length, and resources like that
hardly ever have any relevant metadata in them. Admittedly, returning a
Float32Array from the decode operation isn't ideal for the use cases where
you for example want to assign the data to an array buffer, but it allows
the use case of modifying the data before creating an AudioBuffer out of it
without the cost of a memcpy and double memory.

On another note, I'd like to point out that the discussion has advanced to
the point where we have provided suggestions that for most cases rival and
potentially outperform in some situations, both in terms of CPU and memory
usage, the current designs without imposing race conditions, but you still
continue to take the stance that we should just define extensions to the
other parts of the web platform just to allow racy things in our API? IMO a
lot of the proposals would improve the usability of the API as well, for
example Marcus' suggestion to use arrays of Float32Arrays directly for the
AudioProcessingEvent. I can understand if your argument is backwards
compatibility, but that's not the argument you seem to be making.

Cheers,
Jussi

Received on Saturday, 27 July 2013 14:30:00 UTC