Re: Integer PCM sample formats to Web Audio API?

On Fri, Jan 17, 2014 at 2:24 AM, Marcus Geelnard <mage@opera.com> wrote:

> I think that most would agree that the most important reason for having
> Float32 in the mixer and in the audio processing nodes is to make the graph
> behave correctly for complex configurations, and to guarantee high quality
> audio processing (alternate HDR sample formats such as 24-bit integers
> would just be silly on a Web platform).
>

Yes, the high dynamic range (along with a fixed precision that encompasses
the human hearing range) is the critical bit for avoiding processing issues.


> In fact, the original implementation shared the memory between the audio
> engine and the JS heap (I believe this is still the case in Blink today),
> meaning that it really *had* to use Float32 internally (since that's what's
> exposed in JS land). And if memory serves me correctly, one of the reasons
> for that design was to save memory, since a single copy of the audio buffer
> costs less than having both an internal copy and a copy on the JS heap.
>

Well, it "had" to use float32 simply because it was exposed that way.  It
could have been exposed as int16 in the AudioBuffer, and Float32 in
AudioProcessingEvent.


> So, when discussing Float32 vs Int16 etc, please keep in mind the use
> cases where an AudioBuffer is used for accessing and possibly also
> modifying audio data by using the getChannelData method on the AudioBuffer,
> such as:
>
> * ScriptProcessorNode / AudioProcessingEvent
>

I believe there's already a suggestion on the table to replace AudioBuffer
there with Float32Array.

There has already been a suggestion brought forward by ROC (i.e. allow the
> use of Int16 internally), that should solve the most urgent memory issues.
> If that suggestion does not solve the problems at hand, please provide more
> information.
>

+1.  I'd still like to better understand the conversion impact.

The open questions, to me, are 1) how does the data get EXPOSED then (i.e.
does getChannelData still return a float32array, and force conversion), 2)
if it is exposed in int16 or similar, how far down that rabbit hole do we
go (int8, int24?, int32), and 3) I will point out again that the 2x bloat
from converting to int16 to float32 is potentially much less of a problem
than the sample rate resampling (loading a 22kHz sample into a 96kHz audio
context would cause a >4x bloat).

Received on Friday, 17 January 2014 17:11:27 UTC