Re: Integer PCM sample formats to Web Audio API?

Hi Ray,

2014/1/17 Raymond Toy <rtoy@google.com>:
> Thanks Marcus and Chris.  I now see what you're getting at. Yes, this is an
> option.  Somehow, though, I think that if you need low memory, you also have
> low CPU, so you lose no matter what. :-)
>

Exactly, which is why I think it may be necessary to lessen the audio
quality requirements in case we don't resample, in order to enable low
CPU loads. As long as we spec that behavior, I think it would be fine
- as long as the user is given the option to select whether to go with
the high quality or the low memory code path.


> I think Blink's interpolator for an AudioBufferSourceNode is just a linear
> interpolator. There will be some artifacts if you have to upsample (or
> downsample) too much. This will have to be fixed if we go this way.
>

If we skip resampling entirely, for all samples, yes, we'd have to
require a higher quality interpolator, which would inevitably mean
higher CPU loads.

Otherwise (i.e. if resampling could be done selectively), I think that
it would not be THAT much of a deal. I believe that most
consumer-level audio systems in use today don't do off line resampling
(instead they typically offer a selection between linear, cubic or
N:th order sinc interpolation, for instance).

If we're talking sound effects in a game, I believe a fairly simple
interpolator would suffice (as long as the author is aware of the
implications, and can make an informed decision).

/Marcus


> Ray
>
>
> On Fri, Jan 17, 2014 at 2:42 PM, Marcus Geelnard <mage@opera.com> wrote:
>>
>> The AudioBufferSourceNode already has the capability to play back the
>> AudioBuffer at any sample rate - it would be transparent to the user.
>>
>> As far as I understand, the main reason for resampling up front is to
>> lessen the requirements on the reconstruction filter/interpolator: by doing
>> more work in decodeAudioData you can do less work in AudioBufferSourceNode,
>> and still achieve good quality (at least for the typical use cases when the
>> sample is played back at a pitch close to its original pitch).
>>
>> If we skip the resampling step, I think we have to choose between a more
>> costly interpolator (eats CPU cycles) or slightly reduced audio quality
>> (depending on what combination of resampler and interpolation algorithms are
>> used).
>>
>> One option here could be to let the user decide on a per sample basis
>> which matters the most: audio quality or memory footprint.
>>
>> /Marcus
>>
>>
>> fredagen den 17:e januari 2014 skrev Chris Wilson <cwilso@google.com>:
>>
>>> The goal would be to not resample and store at 48kHz, but still be able
>>> to play back with high quality in that case.  As Marcus said, that would be
>>> harder. (Although is upsampling as costly as downsampling in this case?)
>>>
>>> On Jan 17, 2014 1:46 PM, "Raymond Toy" <rtoy@google.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 17, 2014 at 1:31 PM, Marcus Geelnard <mage@opera.com> wrote:
>>>>>
>>>>> 2014/1/17, Chris Wilson <cwilso@google.com>:
>>>>> > On Fri, Jan 17, 2014 at 2:24 AM, Marcus Geelnard <mage@opera.com>
>>>>> > wrote:
>>>>> >
>>>>> >> So, when discussing Float32 vs Int16 etc, please keep in mind the
>>>>> >> use
>>>>> >> cases where an AudioBuffer is used for accessing and possibly also
>>>>> >> modifying audio data by using the getChannelData method on the
>>>>> >> AudioBuffer,
>>>>> >> such as:
>>>>> >>
>>>>> >> * ScriptProcessorNode / AudioProcessingEvent
>>>>> >>
>>>>> >
>>>>> > I believe there's already a suggestion on the table to replace
>>>>> > AudioBuffer
>>>>> > there with Float32Array.
>>>>>
>>>>> I'm all for that. I think it would be natural to consider that option
>>>>> when specing the new worker-based script processor.
>>>>>
>>>>> >
>>>>> > There has already been a suggestion brought forward by ROC (i.e.
>>>>> > allow the
>>>>> >> use of Int16 internally), that should solve the most urgent memory
>>>>> >> issues.
>>>>> >> If that suggestion does not solve the problems at hand, please
>>>>> >> provide
>>>>> >> more
>>>>> >> information.
>>>>> >>
>>>>> >
>>>>> > +1.  I'd still like to better understand the conversion impact.
>>>>> >
>>>>>
>>>>> If I can find the time I'll try and make some kind of benchmark of a
>>>>> simple int16->float32 format converter.
>>>>>
>>>>> > The open questions, to me, are 1) how does the data get EXPOSED then
>>>>> > (i.e.
>>>>> > does getChannelData still return a float32array, and force
>>>>> > conversion),
>>>>>
>>>>> I would prefer to keep it as Float32, at least for now. I see little
>>>>> value in handing over integers to any kind of JS processing. The
>>>>> implication would probably be that if you use getChannelData, you'll
>>>>> force a conversion of the internal format to Float32.
>>>>>
>>>>> > 2)
>>>>> > if it is exposed in int16 or similar, how far down that rabbit hole
>>>>> > do we
>>>>> > go (int8, int24?, int32), and
>>>>>
>>>>> IMO the added value of such an addition would not justify the API
>>>>> complexity cost, plus it could easily be a slippery slope.
>>>>>
>>>>> > 3) I will point out again that the 2x bloat
>>>>> > from converting to int16 to float32 is potentially much less of a
>>>>> > problem
>>>>> > than the sample rate resampling (loading a 22kHz sample into a 96kHz
>>>>> > audio
>>>>> > context would cause a >4x bloat).
>>>>> >
>>>>>
>>>>> +1   It may be slightly trickier to drop the resampling step though,
>>>>> since it could come with a quality penalty. I suggest that we give
>>>>> that issue some attention.
>>>>>
>>>>> Do we want to make it possible to opt out from the automatic resampling
>>>>> step?
>>>>
>>>>
>>>> What would this mean?  Say the audio context is 48 kHz and you have a
>>>> 22.05 kHz audio sample. So you don't want the sample to be resampled
>>>> automatically to 48 kHz?  Then what happens to the audio when you connect to
>>>> a bunch of nodes?
>>>>
>>>> Ray
>>>>
>>>>>
>>>>>
>>>>> /Marcus
>>>>>
>>>>
>

Received on Saturday, 18 January 2014 11:29:34 UTC