API for streaming encoded audio

Hello,

I would like to propose an additional API for the Web Audio API.  I 
believe that all of the functional elements that would be required have 
already been addressed by this specification so the only additional work 
would be in providing access to them in a new way.

*Summary*
An API for streaming encoded audio (e.g. Ogg/Vorbis, MP3, FLAC) from a 
URL in real time.  The user will provide a callback method which gets 
called each time a new buffer's worth of data has been fetched and decoded.

*Existing functionality**in HTML5 and Web Audio*
HTMLAudioElement is capable of streaming encoded audio from a URL. 
However the implicit buffering (preload) make this element unsuitable 
for audio streams that are dynamic in nature (please see example 
application below).

AudioContext.decodeAudioData() will read encoded audio from a URL, 
decode the data and pass the results buffer to a user provided callback 
method.  This is very useful for short and/or static content but is not 
appropriate for streams of indeterminate length.

*Proposed addition
*The API I am proposing is not very different from 
AudioContext.decodeAudioData().  The major departure is that the 
callback method will be called repeatedly until the stream is stopped or 
reaches EOF.  Ideally the size of the buffer can be specified, for 
example, 8192 bytes.  In this case as each 8K of audio is decoded it 
will be passed to the callback method.

*Example application*
I have a client/server audio synthesizer project.  The server generates 
a stream of Ogg/Vorbis audio that is rendered by the client (browser) 
using an HTML audio tag.  The client can change various parameters (e.g. 
pitch, gain, pan, etc) and the server will immediately alter the output 
stream accordingly.

This arrangement if very nice because it requires nothing special on the 
client side.  However, in the browsers that I've tried the audio element 
buffers very aggressively, literally several minute's worth of data.  So 
when the user tweaks a setting it will not be heard in the browser for a 
long time, even though the server immediately reflected the change.

*Current workaround
*After trying many experiments to prevent the audio element from 
buffering I implemented an alternative approach.

- Add an additional API to the server to fetch the stream as raw PCM (32 
bit floats)
- Client opens a WebSocket connection to the to server.
- Client creates a ScriptProcessorNode, the onaudioprocess callback 
reads data from the server and puts it into the output buffer.

This works well and the latency is low.  The main drawback is losing out 
on the audio compression, thus increasing required bandwidth. Also the 
code on both ends has become considerably more complex.

*Conclusion
*I could get by with the above workaround, though it's not ideal.  I 
think that the API I'm proposing would find many uses beyond the example 
I've given.  Also, I believe that the functionality already exists and 
this is mostly a matter of wiring up the new API.

I'd love to hear anyone's thoughts on this as well as any ideas for 
solutions that I might have missed using the existing APIs.

Best regards,
Joe Meadows

Received on Monday, 10 March 2014 10:23:43 UTC