Re: Starting from Srikumar Karaikudi Subramanian on 2013-05-07 (public-audio@w3.org from April to June 2013)

From: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Date: Tue, 7 May 2013 10:24:26 +0530
To: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Cc: Joseph Berkovitz <joe@noteflight.com>, Stuart Memo <stuartmemo@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-Id: <832EBA55-51EF-43FF-ABAD-EC5B71DD8DD8@gmail.com>
> Now, let's talk about what this means in practice.  Take this test case, which simply generates a sine wave using ScriptProcessorNode: <https://bugzilla.mozilla.org/attachment.cgi?id=738313>.  Currently WebKit/Blink use a double buffering approach, and Gecko uses a buffer queue, which means that the WebKit/Blink implementation will suffer more from delays incurred when handling the audioprocess event.  If you try this test case in Chrome, you'll see that the playback consistently glitches.  The glitching behavior should be a lot better in Firefox since we simply buffer more input data to be able to recover from delays sooner, but there are limitations on how good we can be, and I believe that the current Firefox implementation is quite close to how good ScriptProcessorNode can be implemented. 

Actually there is a bug in the test code and with the bug fixed, both Chrome and Firefox 
don't glitch at all.

Here is the fixed code - https://gist.github.com/srikumarks/5530312

The bug is that the sample counter being incremented is shared across all the channels
that the script node is generating. In my fixed code, I used a different counter for
each channel (which I lazily hard coded as 2 :P).

Best,
-Kumar

On 7 May, 2013, at 6:22 AM, Ehsan Akhgari <ehsan.akhgari@gmail.com> wrote:

> Sorry for the delay in my response here!
> 
> On Tue, Apr 23, 2013 at 5:42 PM, Joseph Berkovitz <joe@noteflight.com> wrote:
> Hi Ehsan,
> 
> Please take a look at my response and pseudocode below regarding this point...
> 
>> "The time when the audio will be played in the same time coordinate system as AudioContext.currentTime. playbackTime allows for very tight synchronization between processing directly in JavaScript with the other events in the context's rendering graph."
>> 
>> I believe that this leaves no room for playbackTime to be inaccurate. The value of playbackTime in an AudioProcessEvent must exactly equal the time T at which a sound scheduled with node.start(T) would be played simultaneously with the first frame of the AudioProcessEvent's sample block.
>> 
>> I have not yet experimented with playbackTime in Gecko yet, but I originally proposed the feature for inclusion in the spec and the above definition is how it needs to work if it's to be useful for synchronization.
>> 
>> You're right about the current text in the spec, but we should probably change it since what you're asking for is pretty much impossible to implement.  Imagine this scenario: let's say that the ScriptProcessorNode wants to dispatch an event with a properly calculated playbackTime.  Let's say that the event handler looks like this:
>> 
>> function handleEvent(event) {
>>   // assume that AudioContext.currentTime can change its value without hitting the event loop
>>   while (event.playbackTime < event.target.context.currentTime);
>> }
>> 
>> Such an event handler would just wait until playbackTime is passed, and then return, and therefore it would make it impossible for the ScriptProcessorNode to operate without latency.
> 
> That is not the way that one would make use of event.playbackTime in a ScriptProcessorNode. As you say, looping inside an event handler like this makes no sense and will wreck the operation of the system.
> 
> The sole purpose of event.playbackTime is to let the code inside the event handler know at what time the samples that it generates will be played.  Not only is this not impossible to implement, it's quite practical, since it's what any "schedulable" source like Oscillators and AudioBufferSourceNodes must do under the hood.
> 
> Here's how it's intended to be used: Going back to pseudocode, let's say you want to start both an Oscillator and some noise starting at some time T… in mono...
> 
> var oscillator = context.createOscillator();
> // ...also configure the oscillator...
> oscillator.connect(context.destination);
> oscillator.start(T);
> 
> var processor = context.createScriptProcessor();
> processor.connect(context.destination);
> processor.onprocessaudio = function(event) {
>   for (var i = 0..processor.bufferSize) {
>     var sampleTime = event.playbackTime + (i * event.outputBuffer.sampleRate);
>     if (sampleTime >= T)
>         event.outputBuffer.getChannelData(0)[i] = Math.random();
>     else
>         event.outputBuffer.getChannelData(0)[i]  = 0;
>   }
> }
> 
> There is in fact no other reliable mechanism in the API for script nodes to synchronize their output with "schedulable" sources, which is why this got into the spec in the first place.
> 
> I hope that there is now less confusion about this after last week's teleconf, but allow me to clarify things a bit.
> 
> ScriptProcessorNode buffers its input and only dispatches the audioprocess event when a buffer of bufferSize samples has been filled up, so in the best case, each ScriptProcessorNode in the graph adds bufferSize/sampleRate seconds of delay.  Now, when the implementation wants to dispatch the audioprocess event, it needs to calculate the playbackTime value.  Note that at this point, the implementation doesn't know how long it's going to take for the event to be handled, so roughly speaking it calculates playbackTime to be equal to currentTime + bufferSize/sampleRate.  This is in practice a guess on part of the implementation that the event handling will be finished very soon with a negligible delay.  Now, let's for the sake of this example say that the web page takes 100ms to handle the event.  Once the event dispatch is complete, we're not 100ms late to playback the outputBuffer, which means that the buffer will be played back at currentTime + bufferSize/sampleRate + 0.1 *at best*.  Now, a good implementation can remember this delay, and the next time calculate playbackTime to be currentTime + bufferSize/sampleRate + 0.1, and basically accumulate all of the delays seen in dispatching the previous events and adjust its estimate of playbackTime every time it fires an audioprocess event.  But unless the implementation can know how long the event handling phase will take it can never calculate an accurate playbackTime, simply because it cannot foresee the future!
> 
> Now, let's talk about what this means in practice.  Take this test case, which simply generates a sine wave using ScriptProcessorNode: <https://bugzilla.mozilla.org/attachment.cgi?id=738313>.  Currently WebKit/Blink use a double buffering approach, and Gecko uses a buffer queue, which means that the WebKit/Blink implementation will suffer more from delays incurred when handling the audioprocess event.  If you try this test case in Chrome, you'll see that the playback consistently glitches.  The glitching behavior should be a lot better in Firefox since we simply buffer more input data to be able to recover from delays sooner, but there are limitations on how good we can be, and I believe that the current Firefox implementation is quite close to how good ScriptProcessorNode can be implemented.  With this fundamental problem, I'm worried that ScriptProcessorNode as currently specified is not really usable for audio generation (it can of course be used to inspect incoming frames, but that's a different use case), so in a way, the whole problem of how to implement playbackTime is the least of my worries.
> 
> Cheers,
> --
> Ehsan
> <http://ehsanakhgari.org/>
Received on Tuesday, 7 May 2013 04:55:15 UTC