Re: Integer PCM sample formats to Web Audio API? from Raymond Toy on 2014-01-17 (public-audio@w3.org from January to March 2014)

From: Raymond Toy <rtoy@google.com>
Date: Fri, 17 Jan 2014 14:50:27 -0800
To: Marcus Geelnard <mage@opera.com>
Cc: Chris Wilson <cwilso@google.com>, Katelyn Gadd <kg@luminance.org>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAE3TgXGoPGzm8jU-kx9wtTSMS0Lf0mUZxjvfiJtCe5ThCkbGNQ@mail.gmail.com>
Thanks Marcus and Chris.  I now see what you're getting at. Yes, this is an
option.  Somehow, though, I think that if you need low memory, you also
have low CPU, so you lose no matter what. :-)

I think Blink's interpolator for an AudioBufferSourceNode is just a linear
interpolator. There will be some artifacts if you have to upsample (or
downsample) too much. This will have to be fixed if we go this way.

Ray


On Fri, Jan 17, 2014 at 2:42 PM, Marcus Geelnard <mage@opera.com> wrote:

> The AudioBufferSourceNode already has the capability to play back the
> AudioBuffer at any sample rate - it would be transparent to the user.
>
> As far as I understand, the main reason for resampling up front is to
> lessen the requirements on the reconstruction filter/interpolator: by doing
> more work in decodeAudioData you can do less work in AudioBufferSourceNode,
> and still achieve good quality (at least for the typical use cases when the
> sample is played back at a pitch close to its original pitch).
>
> If we skip the resampling step, I think we have to choose between a more
> costly interpolator (eats CPU cycles) or slightly reduced audio quality
> (depending on what combination of resampler and interpolation algorithms
> are used).
>
> One option here could be to let the user decide on a per sample basis
> which matters the most: audio quality or memory footprint.
>
> /Marcus
>
>
> fredagen den 17:e januari 2014 skrev Chris Wilson <cwilso@google.com>:
>
> The goal would be to not resample and store at 48kHz, but still be able to
>> play back with high quality in that case.  As Marcus said, that would be
>> harder. (Although is upsampling as costly as downsampling in this case?)
>> On Jan 17, 2014 1:46 PM, "Raymond Toy" <rtoy@google.com> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Jan 17, 2014 at 1:31 PM, Marcus Geelnard <mage@opera.com> wrote:
>>>
>>>> 2014/1/17, Chris Wilson <cwilso@google.com>:
>>>> > On Fri, Jan 17, 2014 at 2:24 AM, Marcus Geelnard <mage@opera.com>
>>>> wrote:
>>>> >
>>>> >> So, when discussing Float32 vs Int16 etc, please keep in mind the use
>>>> >> cases where an AudioBuffer is used for accessing and possibly also
>>>> >> modifying audio data by using the getChannelData method on the
>>>> >> AudioBuffer,
>>>> >> such as:
>>>> >>
>>>> >> * ScriptProcessorNode / AudioProcessingEvent
>>>> >>
>>>> >
>>>> > I believe there's already a suggestion on the table to replace
>>>> AudioBuffer
>>>> > there with Float32Array.
>>>>
>>>> I'm all for that. I think it would be natural to consider that option
>>>> when specing the new worker-based script processor.
>>>>
>>>> >
>>>> > There has already been a suggestion brought forward by ROC (i.e.
>>>> allow the
>>>> >> use of Int16 internally), that should solve the most urgent memory
>>>> >> issues.
>>>> >> If that suggestion does not solve the problems at hand, please
>>>> provide
>>>> >> more
>>>> >> information.
>>>> >>
>>>> >
>>>> > +1.  I'd still like to better understand the conversion impact.
>>>> >
>>>>
>>>> If I can find the time I'll try and make some kind of benchmark of a
>>>> simple int16->float32 format converter.
>>>>
>>>> > The open questions, to me, are 1) how does the data get EXPOSED then
>>>> (i.e.
>>>> > does getChannelData still return a float32array, and force
>>>> conversion),
>>>>
>>>> I would prefer to keep it as Float32, at least for now. I see little
>>>> value in handing over integers to any kind of JS processing. The
>>>> implication would probably be that if you use getChannelData, you'll
>>>> force a conversion of the internal format to Float32.
>>>>
>>>> > 2)
>>>> > if it is exposed in int16 or similar, how far down that rabbit hole
>>>> do we
>>>> > go (int8, int24?, int32), and
>>>>
>>>> IMO the added value of such an addition would not justify the API
>>>> complexity cost, plus it could easily be a slippery slope.
>>>>
>>>> > 3) I will point out again that the 2x bloat
>>>> > from converting to int16 to float32 is potentially much less of a
>>>> problem
>>>> > than the sample rate resampling (loading a 22kHz sample into a 96kHz
>>>> audio
>>>> > context would cause a >4x bloat).
>>>> >
>>>>
>>>> +1   It may be slightly trickier to drop the resampling step though,
>>>> since it could come with a quality penalty. I suggest that we give
>>>> that issue some attention.
>>>>
>>>> Do we want to make it possible to opt out from the automatic resampling
>>>> step?
>>>>
>>>
>>> What would this mean?  Say the audio context is 48 kHz and you have a
>>> 22.05 kHz audio sample. So you don't want the sample to be resampled
>>> automatically to 48 kHz?  Then what happens to the audio when you connect
>>> to a bunch of nodes?
>>>
>>> Ray
>>>
>>>
>>>>
>>>> /Marcus
>>>>
>>>>
>>>
Received on Friday, 17 January 2014 22:50:55 UTC