Re: Decoding audio w/o an AudioContext object from Marcus Geelnard on 2012-08-17 (public-audio@w3.org from July to September 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Fri, 17 Aug 2012 21:57:34 +0000
To: Chris Rogers <crogers@google.com>
Cc: public-audio@w3.org
Message-ID: <20120817215734.7k29y1rr1lqg4wsw@staff.opera.com>
Citerar Chris Rogers <crogers@google.com>:

> On Thu, Aug 16, 2012 at 11:29 PM, Marcus Geelnard <mage@opera.com> wrote:
>
>> Citerar Chris Rogers <crogers@google.com>:
>>
>>  On Wed, Aug 15, 2012 at 11:22 PM, Marcus Geelnard <mage@opera.com> wrote:
>>>
>>>  Hi!
>>>>
>>>> AudioContext provides two methods for decoding audio (both the
>>>> synchronous
>>>> createBuffer and the asynchronous decodeAudioData), and people will quite
>>>> likely want to use these methods for decoding audio files without
>>>> actually
>>>> wanting to play them using an AudioContext.
>>>>
>>>> Is there anything preventing us to allow users to do things like:
>>>>
>>>> function decoded(data) {
>>>>   // Dostuff
>>>> }
>>>>
>>>> AudioContext.decodeAudioData(****rawData, decoded);
>>>>
>>>>
>>>>
>>>> Also, both versions of the createBuffer method could be treated
>>>> similarly.
>>>>
>>>> Any opinions?
>>>>
>>>>
>>> Hi Marcus, one reason that the methods are based on the AudioContext is
>>> because the audio data needs to be decoded *and* sample-rate converted to
>>> the correct sample-rate for the AudioContext.
>>>
>>
>> Why do you need to do that? You can just as well compensate for it in the
>> audioBufferSourceNode: playbackRate' = playbackRate * buffer.sampleRate /
>> ctx.sampleRate.
>>
>
> Hi Marcus, sorry Chris asked me some more questions that I answered, but I
> see they were off-list:
>
> Chris Wilson:
> So does decodeAudioData() always do that sample rate conversion to the
> sample data before placing into the buffer, then?  E.g. if I have an .ogg
> recorded at 96k that I decode in a 44.1kHz audioContext, the resulting
> AudioBuffer's .sampleRate will be 44100 (and the data will be
> correspondingly converted)?
> I presume buffers still play back appropriately if I create them with a
> sampleRate other than the audioContext's, though possibly with lower
> quality conversion?
>
> My response:
> Yes this is correct.  The reason is for performance, where you want to play
> a bunch of overlapping short sounds.  It's cheaper to have decoded PCM data
> that is *exactly* at the sample-rate of the context, so no run-time
> sample-rate conversion is necessary.

True.

> You can still play back the
> AudioBuffers at a different rate, but the run-time sample-rate conversion
> is generally going to be of lesser quality (think linear or cubic getting
> us higher performance).  So, it's really good to get the audio data decoded
> and sample-rate converted using the best and highest-quality resampling
> up-front during asset loading time.

Yes. This seems to be a good strategy. The only two drawbacks that I  
can think of are:

1) Inconsistent decoding between machines. This should not be much of  
a bother, but could be confusing in some corner cases (web developers  
trying to do funny stuff and relying on the result of the decoding).

2) Resource waste. If you have many low-quality samples the re-sampled  
versions would occupy more RAM than the originals (and possibly even  
give worse cache performance). E.g. an 8 Ksamples/s clip re-sampled to  
48 Ksamples/s would occupy 6x the RAM. Again, probably not a big  
issue, but might be bothersome in some special cases.

Both these issues would be much less problematic if we add an optional  
resample/no-resample argument to the decoding routines.

Note: This is a bit similar to how mip-mapping works in OpenGL. It  
usually gives better quality and better performance at the expense of  
initial overhead and added GPU RAM usage, but you need to opt in to  
get it.

>> I think it's kind of counter-intuitive that an audio resource gets decoded
>> differently on different machines, since the decoded data is exposed to the
>> script (this could lead to false assumptions about decoded data lengths
>> etc).
>
> It actually works out quite naturally, and we have lots of experience out
> in the field where most Macs run at 44.1KHz and most Windows machines
> 48KHz.  In addition to the performance and sound quality issues I raised
> above, it's important that the AudioBuffer objects representing
> impulse-responses be at the proper sample-rate.

...this brings me to an obvious follow-up question: How is the  
ConvolverNode supposed to behave if its buffer has a different  
sampleRate than the audio context? On-the-fly interpolation? Makes me  
wonder if a read/write AudioBuffer attribute is the right interface in  
this case.

>> If it's for performance reasons (typically making sense for doppler-free,
>> single pitch sound fx in games), it's kind of a guess, since you can't know
>> beforehand if sounds will be played at playbackRate = 1 or if they will be
>> used as musical instruments, for instance.
>>
>
> But they will very often be played back at rate==1, and when they are
> played back at different rates, it's often small differences centered
> around unity rate.  This is the key principle when doing basic sample
> playback synthesis (SoundFonts, etc.) - see "multisampling":
> http://en.wikipedia.org/wiki/Sample-based_synthesis.

Yes, you're probably right. If you re-sample to a lower quality  
(shouldn't happen too often), you might loose some information though.  
For example, if you re-sample a 96KHz clip to 48KHz (effectively  
filtering out 50% of the sound), and then play it back at playbackRate  
= 0.5, you'd actually get worse quality than without the initial  
re-sampling step...

I guess what I'm getting at here is that re-sampling is a good  
strategy, but unless you're aware of the consequences (as a web  
developer), you can run into problems in some situations.

Currently, I don't think the spec says anything about automatic  
re-sampling. This behavior needs to be very clearly defined.

/Marcus
Received on Friday, 17 August 2012 21:58:04 UTC