Re: Starting from Ehsan Akhgari on 2013-05-07 (public-audio@w3.org from April to June 2013)

From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Date: Mon, 6 May 2013 20:52:38 -0400
To: Joseph Berkovitz <joe@noteflight.com>
Cc: Stuart Memo <stuartmemo@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CANTur_4WD+C+TLR_mFqM4Crp=obALSUC7DuZXeZsL2DkgvfXWg@mail.gmail.com>
Sorry for the delay in my response here!

On Tue, Apr 23, 2013 at 5:42 PM, Joseph Berkovitz <joe@noteflight.com>wrote:

> Hi Ehsan,
>
> Please take a look at my response and pseudocode below regarding this
> point...
>
> "The time when the audio will be played in the same time coordinate system
>> as AudioContext.currentTime. playbackTime allows for very
>> tight synchronization between processing directly in JavaScript with the
>> other events in the context's rendering graph."
>>
>> I believe that this leaves no room for playbackTime to be inaccurate. The
>> value of playbackTime in an AudioProcessEvent must exactly equal the time T
>> at which a sound scheduled with node.start(T) would be played
>> simultaneously with the first frame of the AudioProcessEvent's sample block.
>>
>> I have not yet experimented with playbackTime in Gecko yet, but I
>> originally proposed the feature for inclusion in the spec and the above
>> definition is how it needs to work if it's to be useful for synchronization.
>>
>
> You're right about the current text in the spec, but we should probably
> change it since what you're asking for is pretty much impossible to
> implement.  Imagine this scenario: let's say that the ScriptProcessorNode
> wants to dispatch an event with a properly calculated playbackTime.  Let's
> say that the event handler looks like this:
>
> function handleEvent(event) {
>   // assume that AudioContext.currentTime can change its value without
> hitting the event loop
>   while (event.playbackTime < event.target.context.currentTime);
> }
>
> Such an event handler would just wait until playbackTime is passed, and
> then return, and therefore it would make it impossible for the
> ScriptProcessorNode to operate without latency.
>
>
> That is not the way that one would make use of event.playbackTime in a
> ScriptProcessorNode. As you say, looping inside an event handler like this
> makes no sense and will wreck the operation of the system.
>
> The sole purpose of event.playbackTime is to let the code inside the event
> handler know at what time the samples that it generates will be played.
>  Not only is this not impossible to implement, it's quite practical, since
> it's what any "schedulable" source like Oscillators and
> AudioBufferSourceNodes must do under the hood.
>
> Here's how it's intended to be used: Going back to pseudocode, let's say
> you want to start both an Oscillator and some noise starting at some time
> T… in mono...
>
> var oscillator = context.createOscillator();
> // ...also configure the oscillator...
> oscillator.connect(context.destination);
> oscillator.start(T);
>
> var processor = context.createScriptProcessor();
> processor.connect(context.destination);
> processor.onprocessaudio = function(event) {
>   for (var i = 0..processor.bufferSize) {
>     var sampleTime = event.playbackTime + (i *
> event.outputBuffer.sampleRate);
>     if (sampleTime >= T)
>         event.outputBuffer.getChannelData(0)[i] = Math.random();
>     else
>         event.outputBuffer.getChannelData(0)[i]  = 0;
>   }
> }
>
> There is in fact no other reliable mechanism in the API for script nodes
> to synchronize their output with "schedulable" sources, which is why this
> got into the spec in the first place.
>

I hope that there is now less confusion about this after last week's
teleconf, but allow me to clarify things a bit.

ScriptProcessorNode buffers its input and only dispatches the audioprocess
event when a buffer of bufferSize samples has been filled up, so in the
best case, each ScriptProcessorNode in the graph adds bufferSize/sampleRate
seconds of delay.  Now, when the implementation wants to dispatch the
audioprocess event, it needs to calculate the playbackTime value.  Note
that at this point, the implementation doesn't know how long it's going to
take for the event to be handled, so roughly speaking it calculates
playbackTime to be equal to currentTime + bufferSize/sampleRate.  This is
in practice a guess on part of the implementation that the event handling
will be finished very soon with a negligible delay.  Now, let's for the
sake of this example say that the web page takes 100ms to handle the
event.  Once the event dispatch is complete, we're not 100ms late to
playback the outputBuffer, which means that the buffer will be played back
at currentTime + bufferSize/sampleRate + 0.1 *at best*.  Now, a good
implementation can remember this delay, and the next time calculate
playbackTime to be currentTime + bufferSize/sampleRate + 0.1, and basically
accumulate all of the delays seen in dispatching the previous events and
adjust its estimate of playbackTime every time it fires an audioprocess
event.  But unless the implementation can know how long the event handling
phase will take it can never calculate an accurate playbackTime, simply
because it cannot foresee the future!

Now, let's talk about what this means in practice.  Take this test case,
which simply generates a sine wave using ScriptProcessorNode: <
https://bugzilla.mozilla.org/attachment.cgi?id=738313>.  Currently
WebKit/Blink use a double buffering approach, and Gecko uses a buffer
queue, which means that the WebKit/Blink implementation will suffer more
from delays incurred when handling the audioprocess event.  If you try this
test case in Chrome, you'll see that the playback consistently glitches.
The glitching behavior should be a lot better in Firefox since we simply
buffer more input data to be able to recover from delays sooner, but there
are limitations on how good we can be, and I believe that the current
Firefox implementation is quite close to how good ScriptProcessorNode can
be implemented.  With this fundamental problem, I'm worried that
ScriptProcessorNode as currently specified is not really usable for audio
generation (it can of course be used to inspect incoming frames, but that's
a different use case), so in a way, the whole problem of how to implement
playbackTime is the least of my worries.

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>
Received on Tuesday, 7 May 2013 00:53:47 UTC