- From: Logan Kearsley <chronosurfer@gmail.com>
- Date: Wed, 10 Jul 2013 15:05:03 -0600
- To: public-audio@w3.org
Hello, Could a means of decoding only portions of an audio file, or at least providing decoded data in chunks rather than all at once, be added to the web audio API? I'm working on an HTML5 caption / transcription editing application that includes the option to display audio waveforms synchronized with a text track timeline to assist in lining up cues with the audio (source code at https://github.com/BYU-ARCLITE/subtitle-timeline-editor; excuse the README for being slightly out of date, and the sample index.html file being woefully out of date...). I've tried using decodeAudioData for this, but it fails in two ways: 1. For mid-sized files, it just takes too long to decode before I can start displaying data. This doesn't inhibit functionality, but it's a serious UX annoyance. 2. For even larger files (which are not at all uncommon working with recordings of an hour or two in length), it just runs out of memory and dies. There's a workaround for that as described at http://howisit4amnow.com/decodeaudiodata-wav-chunks/, in which one simply chops up the file into little pieces to feed one at a time to deocdeAudioData, but it seems kind of silly to have to write JavaScript code to manipulate binary audio files just pass the data into a native method which already knows how to manipulate binary audio files. In searching the archives, I found a related thread from January: "Progressive audio data decoding" http://lists.w3.org/Archives/Public/public-audio/2013JanMar/0111.html It was suggested then to look into using the MediaSource API, but that has it's own problems for my application, as I don't want to ever actually play the audio, just get access to the raw samples for visual display. Up till now, I've been using a modified version of the pure-js Aurora audio decoder framework (https://github.com/audiocogs/aurora.js). This allows requesting chunks of any size from a decoder until it runs out, which allows me to start displaying an audio waveform right away and then progressively render more of it as more chunks are requested, and keep memory consumption down because I don't need to put the entire decoded file into memory all at once. Currently I can get most files decoded at about 4x playback speed in Chrome and Firefox. That's pretty good and it makes the application feel fairly responsive when working with 5-to-10 minute files, but it uses up a lot of processor time and it's still a pretty long wait if you want to pick up working at the end of a two-hour long soundtrack and it takes 20 minutes for the visualization to load as far as your current position, and I expect a native implementation could do a lot better. It would also be nice if one could request samples from any arbitrary point in a file, rather than strictly beginning-to-end, as then a lot of time could be saved by just starting the decode from whatever time the user happens to be currently viewing in the application, and then filling in the beginning later. However, I expect the feasibility of doing that depends on individual files types, and I'd be happy to just get faster decoding times. Additionally, I get a lot of overhead from doing resampling- visual displays only require a resolution of about 1000 samples per second (because nobody cares about aligning text with any finer precision than that), and so to make rendering run in a reasonable amount of time and free up memory I downsample everything (with a custom JavaScript resampler) to 1001 samples/second every time I get a chunk. This actually ends up taking almost as much time as the initial decoding on 44.1kHz CD-quality audio! Could a native resampler be added? The decodeAudioData method is already supposed to do resampling, so presumably implementations will have the necessary internal resampling code available anyway. This would also be useful for processing to speed up or slow down audio playback speeds. The most obvious (to me) means of adding in resampling would be to have an AudioBuffer.resample(target_rate) method that produces a new AudioBuffer containing resampled data. However, that could run into problems at the boundaries when trying to resample data that stretches across multiple buffers (which is what you get if you do chunked rather than all-at-once decoding). My current implementation requires creating a Resampler object specified for from and to rates and number of channels which can retain the resampling algorithm state from the end of one buffer to the start of the next. One then passes the resampler one chunk at a time and gets back via asynchronous callback one resampled chunk at a time. -Logan Kearsley
Received on Thursday, 11 July 2013 09:47:31 UTC