Re: How to play back synthesized 22kHz audio in a glitch-free manner? from Jukka Jylänki on 2013-06-18 (public-audio@w3.org from April to June 2013)

From: Jukka Jylänki <jujjyl@gmail.com>
Date: Tue, 18 Jun 2013 22:01:13 +0300
To: Jer Noble <jer.noble@apple.com>
Cc: Chris Rogers <crogers@google.com>, Joe Berkovitz <joe@noteflight.com>, Kevin Gadd <kevin.gadd@gmail.com>, "Robert O'Callahan" <robert@ocallahan.org>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CA+6sJ-3eQ+Gd=znXk+bTfVNy5C2A1+tRPhEM8Jt9yMF_Jp-i6Q@mail.gmail.com>
Two examples of the use case were demonstrated in the first post I made
(the other is even a fully running application, not just a toy demo). Other
use cases include implementing streamed audio playback, music players,
software music synthesizers, VOIP calls, audio synchronized to video and
games, and other applications that load/receive/synthesize buffers of audio
at different frequencies directly in JavaScript and need them to be played
back as one continuous high-quality stream.

Here is an example proposal of an addition to the spec:

In
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBufferSourceNode,
add to the AudioBufferSourceNode interface

double startImmediatelyAfter(AudioBufferSourceNode predecessor);

"The startImmediatelyAfter method

Schedules a sound to be played back with a seamless join to the given
predecessor sound.

Use this function to guarantee a continuous glitch-free join of the
preceding and this sound source nodes. This sound buffer will be timed to
start playing immediately after its predecessor finishes. Both sounds
buffers must contain identical number of sound channels, and their sampling
rates and playback rates must be identical. Neither of the sound source
nodes may be looping.

An exception MUST be thrown if the predecessor node was not queued for
playback, that is, neither start or startImmediatelyAfter methods have been
called on the predecessor node, or if stop() has been called on the
predecessor node. If the predecessor node has already finished its
playback, this source node will start its playback immediately.

An exception MUST be thrown if this source node and the predecessor source
node are not connected to the same destination, or if the predecessor
source node is not connected to any destination.

Any given source node may be used only once as a predecessor for another
source node. If a node is specified as a predecessor twice, an exception
MUST be thrown.

The methods start or startImmediatelyAfter may only be called one time on a
source node, and only one of these methods may be called on a source node.

This function returns the time (in seconds) this sound is scheduled to
start playing. It is in the same time coordinate system as
AudioContext.currentTime. "

That would allow a push model for feeding continuous audio buffers to the
device, and the return value of the function will enable measuring
over-/underbuffering. This functionality is more or less identical to what
XAudio2, OpenAL, Mozilla Audio Data API, DirectShow and DirectSound and
(most likely all other native audio libraries I haven't used) implement.
Also, this will let the JS application control both buffer sizing (fixed or
variable, if needed) and scheduling on when new fill is needed, and will
enable a flexible non-millisecond-critical way to push new audio.

ScriptProcessorNode offers a pull model, but it is constrained compared to
a push model. If one misses a sample (no data is available) when the
callback fires, there is no way (to my knowledge) to feed the data
immediately when it becomes available, but one must wait until the next
callback period, which is always a multiple of the block size. In a push
model, data can be fed immediately as available. For example for an
application that uses buffers of 2048 samples in size at 22kHz, missing
data in an audio callback with ScriptProcessorNode will cause a 2048/22050
= 93 msec delay until the next callback fires, but in a push model, if the
data was made available in between this period, it could be played
immediately. 93 msecs is a long pause.

Also, ScriptProcessorNode requires a constant buffer size, but some
applications may fetch data, synthesize or decode in variable block sizes.
Additionally, it does not currenty allow specifying the sample playback
frequency but requires that one is able to synthesize in the native device
frequency, which the spec doesn't even specify, so in practice,
implementors would be required to be able to synthesize in an arbitrary
rate from 22kHz to 96kHz the browser reports supporting, or be able to
implement resampling on their own. It is already worrying that Web Audio
API supports only Float32 format, and that JS code needs to implement
per-sample U8/S8/U16/S16/U24/S24/etc. -> Float32 format conversions, which
should definitely be the task of the C/C++ SSE-optimized signal processing
code. (I would hurry to add support for these formats to the spec as well,
but that's another story). Forcing users to write signal resamplers in JS
would be even more catastrophic.

Ideally, one would like to see both a pull and push model being supported
with a strong capability set, but if only one would be chosen, the push
model is superior. If anyone argues that start(double when) is designed to
be sample-precise, already the very fact that we are discussing the
question of floating point precision here makes it smelly, and a better API
that allows an explicit contract is needed, because
- it is, well, explicit,
- it is easier to program against (there have now been at least three
attempts in the Emscripten community to do buffer queueing, none of which
got it right the first, or even second time),
- it is easier to implement (neither Chrome or Firefox nightly currently
produce glitch-free buffer joins), and
- it does not require a mathematical proof based on
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html to convince
that FP computation won't produce a drift along time to miss a sample.

I do not see the web being a special case compared to native world, and why
the web audio specs could not just stick to the tried-and-true solutions
that the native world audio APIs have offered for well more than 15 years.
The current spec gives the idea of 'web is only float 32-bit,
likely-48-but-can-vary-by-browser-kHz', which to me is not good enough.
Solutions like Emscripten try to blur the native-web boundary, but it's
difficult to do so, if even the most modern web specs settle to
'almost-there' level.

Alternative solutions to the above spec proposal are of course welcome (be
it enabling setting the device playback rate, or a push variant queue node
of ScriptProcessorNode, or similar), but whatever is finally decided, I
hope that when the final spec is released, there is an example application
shipped with the spec that demonstrates how e.g. 16-bit, 22kHz stereo audio
is synthesized and streamed continuously and guaranteed to be glitch-free.


2013/6/18 Jer Noble <jer.noble@apple.com>

>
> On Jun 18, 2013, at 10:24 AM, Chris Rogers <crogers@google.com> wrote:
>
>
>
>
> On Tue, Jun 18, 2013 at 8:55 AM, Jer Noble <jer.noble@apple.com> wrote:
>
>>
>> On Jun 18, 2013, at 6:55 AM, Joe Berkovitz <joe@noteflight.com> wrote:
>>
>> Actually, as co-editor of the use case document I am very interested in
>> understanding why the arbitrary concatenation of buffers is important. When
>> would this technique be used by a game? Is this for stitching together
>> prerecorded backgrounds?
>>
>>
>> Here's a good example of such a use case:
>> http://labs.echonest.com/Uploader/index.html
>>
>> The WebAudio app slices an uploaded piece of music into discrete chunks,
>> calculates paths between similar chunks, and "stitches" together an
>> inifintely long rendidion of the song by jumping in the timeline between
>> similar chunks.
>>
>> This app is currently implements its queueing model by calling
>> setTimeout(n), where n is 10ms before the anticipated end time of the
>> current sample. However, this causes stuttering and gaps whenever the timer
>> is late by more than 10ms. WebKit Nightlies implement JavaScript timer
>> coalescing when pages are not visible, which has lead the Infinite Jukebox
>> page to pause playback when it gets a 'visibilitychange'/'hidden' event.
>>
>
> A lookahead scheduling of 10ms is a bit optimistic.  Chris Wilson has
> written an excellent article about this topic:
> http://www.html5rocks.com/en/tutorials/audio/scheduling/
>
>
>
> Even so, timer coalescing can delay timers by very large amounts (perhaps
> even 1000ms!) so even some of the techniques Chris mentions in that article
> will fail unless very large lookahead queues are built up.
>
> For stiching together separate AudioBuffers seamlessly, having a buffer
> queue node available would be much more preferable to having web authors
> implement their own queuing model.
>
> -Jer
>
>
Received on Tuesday, 18 June 2013 19:01:41 UTC