Re: Integer PCM sample formats to Web Audio API? from Marcus Geelnard on 2014-01-18 (public-audio@w3.org from January to March 2014)

From: Marcus Geelnard <mage@opera.com>
Date: Sat, 18 Jan 2014 12:45:20 +0100
To: Raymond Toy <rtoy@google.com>
Cc: Chris Wilson <cwilso@google.com>, Katelyn Gadd <kg@luminance.org>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAL8YEv5tRO5W4SmRnQJ8=a4YdM4K0uvOA2CmJ4qhrvy7jNxYYg@mail.gmail.com>
...another question that we'll have to answer is: Can we do
resampling, and still preserve integer formats, or does resampling
imply conversion to Float32?

I'm a bit worried about two things:

1) The resampling filter might introduce peaks that exceed the [-1,1]
range, that would be clipped when storing in an integer format. That
could sound really bad.

2) Resampling a low sample-rate sample (e.g. 11 KHz -> 96 KHz) while
using a low bit resolution (e.g. 8-bit), will typically introduce
audible quantization noise (there's a lot of new high frequency
"silence" that mostly consists of a quantization error noise floor).
This might not be a noticeable problem for 16-bit samples, but
nevertheless something worth considering.

/Marcus


2014/1/18 Marcus Geelnard <mage@opera.com>:
> Hi Ray,
>
> 2014/1/17 Raymond Toy <rtoy@google.com>:
>> Thanks Marcus and Chris.  I now see what you're getting at. Yes, this is an
>> option.  Somehow, though, I think that if you need low memory, you also have
>> low CPU, so you lose no matter what. :-)
>>
>
> Exactly, which is why I think it may be necessary to lessen the audio
> quality requirements in case we don't resample, in order to enable low
> CPU loads. As long as we spec that behavior, I think it would be fine
> - as long as the user is given the option to select whether to go with
> the high quality or the low memory code path.
>
>
>> I think Blink's interpolator for an AudioBufferSourceNode is just a linear
>> interpolator. There will be some artifacts if you have to upsample (or
>> downsample) too much. This will have to be fixed if we go this way.
>>
>
> If we skip resampling entirely, for all samples, yes, we'd have to
> require a higher quality interpolator, which would inevitably mean
> higher CPU loads.
>
> Otherwise (i.e. if resampling could be done selectively), I think that
> it would not be THAT much of a deal. I believe that most
> consumer-level audio systems in use today don't do off line resampling
> (instead they typically offer a selection between linear, cubic or
> N:th order sinc interpolation, for instance).
>
> If we're talking sound effects in a game, I believe a fairly simple
> interpolator would suffice (as long as the author is aware of the
> implications, and can make an informed decision).
>
> /Marcus
>
>
>> Ray
>>
>>
>> On Fri, Jan 17, 2014 at 2:42 PM, Marcus Geelnard <mage@opera.com> wrote:
>>>
>>> The AudioBufferSourceNode already has the capability to play back the
>>> AudioBuffer at any sample rate - it would be transparent to the user.
>>>
>>> As far as I understand, the main reason for resampling up front is to
>>> lessen the requirements on the reconstruction filter/interpolator: by doing
>>> more work in decodeAudioData you can do less work in AudioBufferSourceNode,
>>> and still achieve good quality (at least for the typical use cases when the
>>> sample is played back at a pitch close to its original pitch).
>>>
>>> If we skip the resampling step, I think we have to choose between a more
>>> costly interpolator (eats CPU cycles) or slightly reduced audio quality
>>> (depending on what combination of resampler and interpolation algorithms are
>>> used).
>>>
>>> One option here could be to let the user decide on a per sample basis
>>> which matters the most: audio quality or memory footprint.
>>>
>>> /Marcus
>>>
>>>
>>> fredagen den 17:e januari 2014 skrev Chris Wilson <cwilso@google.com>:
>>>
>>>> The goal would be to not resample and store at 48kHz, but still be able
>>>> to play back with high quality in that case.  As Marcus said, that would be
>>>> harder. (Although is upsampling as costly as downsampling in this case?)
>>>>
>>>> On Jan 17, 2014 1:46 PM, "Raymond Toy" <rtoy@google.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 17, 2014 at 1:31 PM, Marcus Geelnard <mage@opera.com> wrote:
>>>>>>
>>>>>> 2014/1/17, Chris Wilson <cwilso@google.com>:
>>>>>> > On Fri, Jan 17, 2014 at 2:24 AM, Marcus Geelnard <mage@opera.com>
>>>>>> > wrote:
>>>>>> >
>>>>>> >> So, when discussing Float32 vs Int16 etc, please keep in mind the
>>>>>> >> use
>>>>>> >> cases where an AudioBuffer is used for accessing and possibly also
>>>>>> >> modifying audio data by using the getChannelData method on the
>>>>>> >> AudioBuffer,
>>>>>> >> such as:
>>>>>> >>
>>>>>> >> * ScriptProcessorNode / AudioProcessingEvent
>>>>>> >>
>>>>>> >
>>>>>> > I believe there's already a suggestion on the table to replace
>>>>>> > AudioBuffer
>>>>>> > there with Float32Array.
>>>>>>
>>>>>> I'm all for that. I think it would be natural to consider that option
>>>>>> when specing the new worker-based script processor.
>>>>>>
>>>>>> >
>>>>>> > There has already been a suggestion brought forward by ROC (i.e.
>>>>>> > allow the
>>>>>> >> use of Int16 internally), that should solve the most urgent memory
>>>>>> >> issues.
>>>>>> >> If that suggestion does not solve the problems at hand, please
>>>>>> >> provide
>>>>>> >> more
>>>>>> >> information.
>>>>>> >>
>>>>>> >
>>>>>> > +1.  I'd still like to better understand the conversion impact.
>>>>>> >
>>>>>>
>>>>>> If I can find the time I'll try and make some kind of benchmark of a
>>>>>> simple int16->float32 format converter.
>>>>>>
>>>>>> > The open questions, to me, are 1) how does the data get EXPOSED then
>>>>>> > (i.e.
>>>>>> > does getChannelData still return a float32array, and force
>>>>>> > conversion),
>>>>>>
>>>>>> I would prefer to keep it as Float32, at least for now. I see little
>>>>>> value in handing over integers to any kind of JS processing. The
>>>>>> implication would probably be that if you use getChannelData, you'll
>>>>>> force a conversion of the internal format to Float32.
>>>>>>
>>>>>> > 2)
>>>>>> > if it is exposed in int16 or similar, how far down that rabbit hole
>>>>>> > do we
>>>>>> > go (int8, int24?, int32), and
>>>>>>
>>>>>> IMO the added value of such an addition would not justify the API
>>>>>> complexity cost, plus it could easily be a slippery slope.
>>>>>>
>>>>>> > 3) I will point out again that the 2x bloat
>>>>>> > from converting to int16 to float32 is potentially much less of a
>>>>>> > problem
>>>>>> > than the sample rate resampling (loading a 22kHz sample into a 96kHz
>>>>>> > audio
>>>>>> > context would cause a >4x bloat).
>>>>>> >
>>>>>>
>>>>>> +1   It may be slightly trickier to drop the resampling step though,
>>>>>> since it could come with a quality penalty. I suggest that we give
>>>>>> that issue some attention.
>>>>>>
>>>>>> Do we want to make it possible to opt out from the automatic resampling
>>>>>> step?
>>>>>
>>>>>
>>>>> What would this mean?  Say the audio context is 48 kHz and you have a
>>>>> 22.05 kHz audio sample. So you don't want the sample to be resampled
>>>>> automatically to 48 kHz?  Then what happens to the audio when you connect to
>>>>> a bunch of nodes?
>>>>>
>>>>> Ray
>>>>>
>>>>>>
>>>>>>
>>>>>> /Marcus
>>>>>>
>>>>>
>>
Received on Saturday, 18 January 2014 11:45:47 UTC