Re: Decoding audio w/o an AudioContext object from Chris Rogers on 2012-08-17 (public-audio@w3.org from July to September 2012)

From: Chris Rogers <crogers@google.com>
Date: Fri, 17 Aug 2012 10:43:30 -0700
To: Marcus Geelnard <mage@opera.com>
Cc: public-audio@w3.org
Message-ID: <CA+EzO0kOWyr2KT13hJmKUZCZsqRKfVqES5H02iz3siMeVCdG_g@mail.gmail.com>
On Thu, Aug 16, 2012 at 11:29 PM, Marcus Geelnard <mage@opera.com> wrote:

> Citerar Chris Rogers <crogers@google.com>:
>
>  On Wed, Aug 15, 2012 at 11:22 PM, Marcus Geelnard <mage@opera.com> wrote:
>>
>>  Hi!
>>>
>>> AudioContext provides two methods for decoding audio (both the
>>> synchronous
>>> createBuffer and the asynchronous decodeAudioData), and people will quite
>>> likely want to use these methods for decoding audio files without
>>> actually
>>> wanting to play them using an AudioContext.
>>>
>>> Is there anything preventing us to allow users to do things like:
>>>
>>> function decoded(data) {
>>>   // Dostuff
>>> }
>>>
>>> AudioContext.decodeAudioData(****rawData, decoded);
>>>
>>>
>>>
>>> Also, both versions of the createBuffer method could be treated
>>> similarly.
>>>
>>> Any opinions?
>>>
>>>
>> Hi Marcus, one reason that the methods are based on the AudioContext is
>> because the audio data needs to be decoded *and* sample-rate converted to
>> the correct sample-rate for the AudioContext.
>>
>
> Why do you need to do that? You can just as well compensate for it in the
> audioBufferSourceNode: playbackRate' = playbackRate * buffer.sampleRate /
> ctx.sampleRate.
>

Hi Marcus, sorry Chris asked me some more questions that I answered, but I
see they were off-list:

Chris Wilson:
So does decodeAudioData() always do that sample rate conversion to the
sample data before placing into the buffer, then?  E.g. if I have an .ogg
recorded at 96k that I decode in a 44.1kHz audioContext, the resulting
AudioBuffer's .sampleRate will be 44100 (and the data will be
correspondingly converted)?
I presume buffers still play back appropriately if I create them with a
sampleRate other than the audioContext's, though possibly with lower
quality conversion?

My response:
Yes this is correct.  The reason is for performance, where you want to play
a bunch of overlapping short sounds.  It's cheaper to have decoded PCM data
that is *exactly* at the sample-rate of the context, so no run-time
sample-rate conversion is necessary.  You can still play back the
AudioBuffers at a different rate, but the run-time sample-rate conversion
is generally going to be of lesser quality (think linear or cubic getting
us higher performance).  So, it's really good to get the audio data decoded
and sample-rate converted using the best and highest-quality resampling
up-front during asset loading time.




> I think it's kind of counter-intuitive that an audio resource gets decoded
> differently on different machines, since the decoded data is exposed to the
> script (this could lead to false assumptions about decoded data lengths
> etc).
>

It actually works out quite naturally, and we have lots of experience out
in the field where most Macs run at 44.1KHz and most Windows machines
48KHz.  In addition to the performance and sound quality issues I raised
above, it's important that the AudioBuffer objects representing
impulse-responses be at the proper sample-rate.


>
> If it's for performance reasons (typically making sense for doppler-free,
> single pitch sound fx in games), it's kind of a guess, since you can't know
> beforehand if sounds will be played at playbackRate = 1 or if they will be
> used as musical instruments, for instance.
>

But they will very often be played back at rate==1, and when they are
played back at different rates, it's often small differences centered
around unity rate.  This is the key principle when doing basic sample
playback synthesis (SoundFonts, etc.) - see "multisampling":
http://en.wikipedia.org/wiki/Sample-based_synthesis.


>
> An alternative could be to pass an optional resampleTo argument to
> decodeAudioData() and createBuffer(), just as with mixToMono, to let the
> developer decide which sounds to optimize for 1:1 playback.
>

Yes, this could be possible as an optional argument.


>
> By the way, what is the use case for mixToMono, and why is it not
> available as an argument to decodeAudioData().


Yes, I know, the synchronous method is older and not consistent.  We might
even consider removing it from the spec since async is better.
Received on Friday, 17 August 2012 17:43:57 UTC