- From: Marcus Geelnard <mage@opera.com>
- Date: Sat, 18 Jan 2014 12:45:20 +0100
- To: Raymond Toy <rtoy@google.com>
- Cc: Chris Wilson <cwilso@google.com>, Katelyn Gadd <kg@luminance.org>, "public-audio@w3.org" <public-audio@w3.org>
...another question that we'll have to answer is: Can we do resampling, and still preserve integer formats, or does resampling imply conversion to Float32? I'm a bit worried about two things: 1) The resampling filter might introduce peaks that exceed the [-1,1] range, that would be clipped when storing in an integer format. That could sound really bad. 2) Resampling a low sample-rate sample (e.g. 11 KHz -> 96 KHz) while using a low bit resolution (e.g. 8-bit), will typically introduce audible quantization noise (there's a lot of new high frequency "silence" that mostly consists of a quantization error noise floor). This might not be a noticeable problem for 16-bit samples, but nevertheless something worth considering. /Marcus 2014/1/18 Marcus Geelnard <mage@opera.com>: > Hi Ray, > > 2014/1/17 Raymond Toy <rtoy@google.com>: >> Thanks Marcus and Chris. I now see what you're getting at. Yes, this is an >> option. Somehow, though, I think that if you need low memory, you also have >> low CPU, so you lose no matter what. :-) >> > > Exactly, which is why I think it may be necessary to lessen the audio > quality requirements in case we don't resample, in order to enable low > CPU loads. As long as we spec that behavior, I think it would be fine > - as long as the user is given the option to select whether to go with > the high quality or the low memory code path. > > >> I think Blink's interpolator for an AudioBufferSourceNode is just a linear >> interpolator. There will be some artifacts if you have to upsample (or >> downsample) too much. This will have to be fixed if we go this way. >> > > If we skip resampling entirely, for all samples, yes, we'd have to > require a higher quality interpolator, which would inevitably mean > higher CPU loads. > > Otherwise (i.e. if resampling could be done selectively), I think that > it would not be THAT much of a deal. I believe that most > consumer-level audio systems in use today don't do off line resampling > (instead they typically offer a selection between linear, cubic or > N:th order sinc interpolation, for instance). > > If we're talking sound effects in a game, I believe a fairly simple > interpolator would suffice (as long as the author is aware of the > implications, and can make an informed decision). > > /Marcus > > >> Ray >> >> >> On Fri, Jan 17, 2014 at 2:42 PM, Marcus Geelnard <mage@opera.com> wrote: >>> >>> The AudioBufferSourceNode already has the capability to play back the >>> AudioBuffer at any sample rate - it would be transparent to the user. >>> >>> As far as I understand, the main reason for resampling up front is to >>> lessen the requirements on the reconstruction filter/interpolator: by doing >>> more work in decodeAudioData you can do less work in AudioBufferSourceNode, >>> and still achieve good quality (at least for the typical use cases when the >>> sample is played back at a pitch close to its original pitch). >>> >>> If we skip the resampling step, I think we have to choose between a more >>> costly interpolator (eats CPU cycles) or slightly reduced audio quality >>> (depending on what combination of resampler and interpolation algorithms are >>> used). >>> >>> One option here could be to let the user decide on a per sample basis >>> which matters the most: audio quality or memory footprint. >>> >>> /Marcus >>> >>> >>> fredagen den 17:e januari 2014 skrev Chris Wilson <cwilso@google.com>: >>> >>>> The goal would be to not resample and store at 48kHz, but still be able >>>> to play back with high quality in that case. As Marcus said, that would be >>>> harder. (Although is upsampling as costly as downsampling in this case?) >>>> >>>> On Jan 17, 2014 1:46 PM, "Raymond Toy" <rtoy@google.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jan 17, 2014 at 1:31 PM, Marcus Geelnard <mage@opera.com> wrote: >>>>>> >>>>>> 2014/1/17, Chris Wilson <cwilso@google.com>: >>>>>> > On Fri, Jan 17, 2014 at 2:24 AM, Marcus Geelnard <mage@opera.com> >>>>>> > wrote: >>>>>> > >>>>>> >> So, when discussing Float32 vs Int16 etc, please keep in mind the >>>>>> >> use >>>>>> >> cases where an AudioBuffer is used for accessing and possibly also >>>>>> >> modifying audio data by using the getChannelData method on the >>>>>> >> AudioBuffer, >>>>>> >> such as: >>>>>> >> >>>>>> >> * ScriptProcessorNode / AudioProcessingEvent >>>>>> >> >>>>>> > >>>>>> > I believe there's already a suggestion on the table to replace >>>>>> > AudioBuffer >>>>>> > there with Float32Array. >>>>>> >>>>>> I'm all for that. I think it would be natural to consider that option >>>>>> when specing the new worker-based script processor. >>>>>> >>>>>> > >>>>>> > There has already been a suggestion brought forward by ROC (i.e. >>>>>> > allow the >>>>>> >> use of Int16 internally), that should solve the most urgent memory >>>>>> >> issues. >>>>>> >> If that suggestion does not solve the problems at hand, please >>>>>> >> provide >>>>>> >> more >>>>>> >> information. >>>>>> >> >>>>>> > >>>>>> > +1. I'd still like to better understand the conversion impact. >>>>>> > >>>>>> >>>>>> If I can find the time I'll try and make some kind of benchmark of a >>>>>> simple int16->float32 format converter. >>>>>> >>>>>> > The open questions, to me, are 1) how does the data get EXPOSED then >>>>>> > (i.e. >>>>>> > does getChannelData still return a float32array, and force >>>>>> > conversion), >>>>>> >>>>>> I would prefer to keep it as Float32, at least for now. I see little >>>>>> value in handing over integers to any kind of JS processing. The >>>>>> implication would probably be that if you use getChannelData, you'll >>>>>> force a conversion of the internal format to Float32. >>>>>> >>>>>> > 2) >>>>>> > if it is exposed in int16 or similar, how far down that rabbit hole >>>>>> > do we >>>>>> > go (int8, int24?, int32), and >>>>>> >>>>>> IMO the added value of such an addition would not justify the API >>>>>> complexity cost, plus it could easily be a slippery slope. >>>>>> >>>>>> > 3) I will point out again that the 2x bloat >>>>>> > from converting to int16 to float32 is potentially much less of a >>>>>> > problem >>>>>> > than the sample rate resampling (loading a 22kHz sample into a 96kHz >>>>>> > audio >>>>>> > context would cause a >4x bloat). >>>>>> > >>>>>> >>>>>> +1 It may be slightly trickier to drop the resampling step though, >>>>>> since it could come with a quality penalty. I suggest that we give >>>>>> that issue some attention. >>>>>> >>>>>> Do we want to make it possible to opt out from the automatic resampling >>>>>> step? >>>>> >>>>> >>>>> What would this mean? Say the audio context is 48 kHz and you have a >>>>> 22.05 kHz audio sample. So you don't want the sample to be resampled >>>>> automatically to 48 kHz? Then what happens to the audio when you connect to >>>>> a bunch of nodes? >>>>> >>>>> Ray >>>>> >>>>>> >>>>>> >>>>>> /Marcus >>>>>> >>>>> >>
Received on Saturday, 18 January 2014 11:45:47 UTC