Selective Audio Decoding

Hello,

Could a means of decoding only portions of an audio file, or at least
providing decoded data in chunks rather than all at once, be added to
the web audio API?

I'm working on an HTML5 caption / transcription editing application
that includes the option to display audio waveforms synchronized with
a text track timeline to assist in lining up cues with the audio
(source code at
https://github.com/BYU-ARCLITE/subtitle-timeline-editor; excuse the
README for being slightly out of date, and the sample index.html file
being woefully out of date...). I've tried using decodeAudioData for
this, but it fails in two ways:

1. For mid-sized files, it just takes too long to decode before I can
start displaying data. This doesn't inhibit functionality, but it's a
serious UX annoyance.

2. For even larger files (which are not at all uncommon working with
recordings of an hour or two in length), it just runs out of memory
and dies.
There's a workaround for that as described at
http://howisit4amnow.com/decodeaudiodata-wav-chunks/, in which one
simply chops up the file into little pieces to feed one at a time to
deocdeAudioData, but it seems kind of silly to have to write
JavaScript code to manipulate binary audio files just pass the data
into a native method which already knows how to manipulate binary
audio files.

In searching the archives, I found a related thread from January:
"Progressive audio data decoding"
http://lists.w3.org/Archives/Public/public-audio/2013JanMar/0111.html
It was suggested then to look into using the MediaSource API, but that
has it's own problems for my application, as I don't want to ever
actually play the audio, just get access to the raw samples for visual
display.

Up till now, I've been using a modified version of the pure-js Aurora
audio decoder framework (https://github.com/audiocogs/aurora.js). This
allows requesting chunks of any size from a decoder until it runs out,
which allows me to start displaying an audio waveform right away and
then progressively render more of it as more chunks are requested, and
keep memory consumption down because I don't need to put the entire
decoded file into memory all at once. Currently I can get most files
decoded at about 4x playback speed in Chrome and Firefox. That's
pretty good and it makes the application feel fairly responsive when
working with 5-to-10 minute files, but it uses up a lot of processor
time and it's still a pretty long wait if you want to pick up working
at the end of a two-hour long soundtrack and it takes 20 minutes for
the visualization to load as far as your current position, and I
expect a native implementation could do a lot better.

It would also be nice if one could request samples from any arbitrary
point in a file, rather than strictly beginning-to-end, as then a lot
of time could be saved by just starting the decode from whatever time
the user happens to be currently viewing in the application, and then
filling in the beginning later. However, I expect the feasibility of
doing that depends on individual files types, and I'd be happy to just
get faster decoding times.

Additionally, I get a lot of overhead from doing resampling- visual
displays only require a resolution of about 1000 samples per second
(because nobody cares about aligning text with any finer precision
than that), and so to make rendering run in a reasonable amount of
time and free up memory I downsample everything (with a custom
JavaScript resampler) to 1001 samples/second every time I get a chunk.
This actually ends up taking almost as much time as the initial
decoding on 44.1kHz CD-quality audio! Could a native resampler be
added? The decodeAudioData method is already supposed to do
resampling, so presumably implementations will have the necessary
internal resampling code available anyway. This would also be useful
for processing to speed up or slow down audio playback speeds.

The most obvious (to me) means of adding in resampling would be to
have an AudioBuffer.resample(target_rate) method that produces a new
AudioBuffer containing resampled data. However, that could run into
problems at the boundaries when trying to resample data that stretches
across multiple buffers (which is what you get if you do chunked
rather than all-at-once decoding). My current implementation requires
creating a Resampler object specified for from and to rates and number
of channels which can retain the resampling algorithm state from the
end of one buffer to the start of the next. One then passes the
resampler one chunk at a time and gets back via asynchronous callback
one resampled chunk at a time.

-Logan Kearsley

Received on Thursday, 11 July 2013 09:47:31 UTC