Re: Integer PCM sample formats to Web Audio API? from Raymond Toy on 2014-01-15 (public-audio@w3.org from January to March 2014)

From: Raymond Toy <rtoy@google.com>
Date: Wed, 15 Jan 2014 13:36:24 -0800
To: Katelyn Gadd <kg@luminance.org>
Cc: Chris Wilson <cwilso@google.com>, Marcus Geelnard <mage@opera.com>, Paul Adenot <padenot@mozilla.com>, Jukka Jylänki <jujjyl@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAE3TgXEmzs6EkhLqizJsSUq4bn5SBS3W4Q9KAs=Y_D5T9p6SsQ@mail.gmail.com>

>
>
>
>>
>> The AudioContext's sampleRate is not set to a defined number, but in
>> practice the sampleRate is set to the audio output sample rate - that is,
>> the AudioDestinationNode's native rate - since that's where the clock is
>> coming from.  The point is that the entire audio context is run in a single
>> rate, to minimize resampling.
>>
>
> Arguably the choice to mix in 32-bit float should be equivalent to a
> choice to mix in 44khz or 48khz. It shouldn't have to influence the source
> format of audio data any more than the output rate would require source
> audio to be stored at that rate. This is the point I'm trying to make: both
> bitness and sample rate are important controls to have over source audio.
>

But you, as the web audio app author, can't control the sample rate of the
device it is playing on.  Even on the same machine, the machine owner could
have changed the local audio device to be 44.1 kHz today, but tomorrow he
might change it to 88 kHz. Should your app suddenly sound funny the next
day? If not doing resampling is important, you'd have to supply all of your
assets in different sampling rates. (Which might be a good idea anyway, if
you can afford the storage and bandwidth.)


>
>
>>
>> Having such an option in the API gives the implementation an opportunity
>>>> to save memory when memory is scarce, but it's not necessarily forced to do
>>>> so.
>>>>
>>>
>>> The whole point is to force the implementation to save memory. An
>>> application that runs out of memory 80% of the time is not appreciably
>>> better than one that does so 100% of the time - end users will consider
>>> both unusable.
>>>
>>
>> Given all the other factors that may change memory usage in the web
>> platform, I'm not sure why this one feature will solve that problem.  Or
>> even come close.  Again, I'm not saying I see no reason to look closely at
>> this; I'm just saying that I don't think this is as big a slam dunk as you
>> appear to, and I think there are notable situations when it is better to
>> NOT store that data in int16, and there will be
>>
>
> What situations are these? I find it hard to imagine a scenario where
> software playback is going to benefit tremendously from using 2x the memory
> to store sample data. Certainly there are huge advantages to *mixing* in
> floating-point; are you arguing that making the mixer slightly faster
> merits using double the memory (and thus, double the memory bandwidth, if
> not more - memory bandwidth being especially precious on mobile platforms)?
> Must the floating-point version of said buffer be the de-facto storage
> format even though it is merely a minor mixing efficiency optimization?
>
> I am dubious about the tremendous cost implied by converting from int16 to
> float32 in the mixer, also. It's a trivial, common operation, and depending
> on architecture I would expect it could pay for itself in reduced memory
> bandwidth usage and more efficient use of L1/L2/L3 caches. Have you
> benchmarked this? Do you have test cases that demonstrate a tremendous
> performance win by using float32 for everything versus int16 or int8?
>

It's not just a minor efficiency optimization.  Some of the algorithms are
quite complex, so implementing them in fixed-point would be particularly
painful. The existence of a biquad filter with, essentially, unlimited huge
peaking can suddenly cause something that was nice to suddenly either clip
or roll-around causing terrible glitching.  Yes, this is solvable, but
int16 makes the internal implementation much, much more difficult, prone to
bugs, and many more round-off effects.

For controlled applications, fixed-point is ok.  For something as open as
WebAudio, floating-point is a natural choice.

I know Chris Rogers wanted to be able to use WebAudio for pro-audio
applications where (I think) 24-bit or floating-point samples are the norm.
Forcing int16 internally prevents this completely.

Received on Wednesday, 15 January 2014 21:42:34 UTC