Refining AudioProcessEvent definition of timing and sequencing from Joseph Berkovitz on 2014-09-22 (public-audio@w3.org from July to September 2014)

From: Joseph Berkovitz <joe@noteflight.com>
Date: Mon, 22 Sep 2014 15:43:07 -0400
To: Chris Wilson <cwilso@google.com>
Cc: Olivier Thereaux <olivier.thereaux@bbc.co.uk>, Audio WG <public-audio@w3.org>
Message-Id: <4A25393E-1EF8-47C0-86F1-CAE95201E257@noteflight.com>
Chris,

I’d like to fork the “better definition of scripted node timing/sequencing” topic into its own thread, namely this one.

I’m not ready with a full proposal of how this should work. Maybe it’s better if we try to map out the territory first before investing effort into proposals… please read on...

On Sep 19, 2014, at 5:26 PM, Chris Wilson <cwilso@google.com> wrote:
> I think we should say enough that people understand what is guaranteed in terms of timing/sequencing of audio callbacks, and know what the browser is attempting to optimize overall (e.g. trade off between preventing underruns due to end-to-end graph processing time and minimizing latency). But it sounds as though you want a more exact description of how WebAudio works today, which could be much more specific than that.
> 
> Not really.  Obviously, I think we should be more exact in, say, how a DynamicsProcessorNode works.  Your goal of making guarantees of timing and sequencing of audio callbacks, and what the browser is trying to optimize for, is actually going to be far more limiting than that, though.

I don’t agree that it’s all that limiting, because my goals for the normative part of such a section are pretty modest. The part about optimization could be limiting if it were normative, but I think instead it should be mostly presented as non-normative, informative description.

Here is a rough outline of what I think the spec could provide:

Normative behavior (note: some of these are NON-guarantees to discourage assumptions by devs):

  - any postMessage() invocations on an AWN in the main thread in the same execution trace as the construction of the AWN, will be guaranteed to be delivered to its Worker before any AudioProcessEvents. This ensures that a node can be initialized properly before it does anything.

  - all postMessage() invocations on an AWN in the same execution trace will be delivered to its Worker without any interleaved AudioProcessEvents

  - the UA guarantees monotonically increasing values of playbackTime in successive AudioProcessEvents seen by any given Worker

  - no “exact splicing” guarantee: the playbackTime of AudioProcessEvent *may* fail to immediately follow the last frame of the preceding event, if the audio engine becomes starved for cycles. Moral: always look at playbackTime.

  - no “real time synchronization” guarantee:  you can’t count on an AudioProcessEvent being delivered in any particular temporal relationship to the actual playback time of the referenced sample frames. Moral: receipt of a message from the main thread may occur some arbitrary time interval before the audible playback of the frames in a following AudioProcessEvent.

  - no “consistent buffer size” guarantee: you can’t count on a consistent buffer size in successive AudioProcessEvents.

Non-normative goals for implementations:

  - prevent output buffer underruns by building enough latency (i.e. buffer queuing) into the system to prevent glitching from inter-thread handoffs or from a “reasonable" CPU processing cost in the graph. This latency will of course be platform dependent but we all agree it has to be minimal; native DAWs on desktop platforms with reasonable audio drivers seem to achieve 3-5 ms today which I am told leaves room for a couple of thread handoffs.

  - minimize the time interval between AudioProcessEvent delivery and the actual time the event’s emitted samples are heard (which of course implies minimizing the above latency).

  - minimize CPU power consumption by potentially varying buffer sizes dynamically

  - Obviously the two above goals conflict and so there is a tradeoff between them; this job of the UA to perform that tradeoff should also be described non normatively.


> This is why I was saying auto-parallelizing is not something I think we should do, because it takes that one tradeoff - latency vs CPU overrun - and turns it into a complex relationship between CPU, number of threads, thread communication cost and latency.  The decision to move to multiple cores would knowingly jack up latency (even if it's consistent latency for the whole graph), in order to optimize for lower CPU in the (single) audio thread.

As Russell McClellan pointed out in the issue comments, any multiple core processing should actually permit latency to be _lower_, not higher (and should certainly not “jack it up”). In the common topology of multiple input channels with linear effect chains mixed down into a master bus, the number of threads goes up, but thread communication cost increases only by a single thread handoff, and thus latency may increase by a few milliseconds if that. (in a multicore-capable implementation, the latency would probably be pegged at a reasonable value that allowed up to some maximum number of cores to be involved in audio processing.)


>  
> To the extent that such a description bakes in a serialized approach to audio processing (or even rules out flexibility in the order of serial processing) I think that would be a bad outcome and I don’t yet see how the extra specificity helps anyone since exact synchronization between the interior impls of scripted nodes is forbidden. As long as we are clearly stating the UA’s performance goals and visible guarantees, is that not enough?
> 
> You must be thinking I'm suggesting more than I am.  I don't think the order of processing can be observed (though I think it would be useful to non-normatively describe how connections work), and I don't think we can guarantee timing of any sort - the vagaries of underlying hardware and APIs make that challenging.  I'd like to say we can describe the tradeoff between latency and underruns, but I don't think we can in any normative way if you want to keep the door open for automatic parallelism.

As I said above, I think we should solve that problem by going non-normative on the description of timing :-)

.            .       .    .  . ...Joe

Joe Berkovitz
President

Noteflight LLC
Boston, Mass.
phone: +1 978 314 6271
www.noteflight.com
"Your music, everywhere"
Received on Monday, 22 September 2014 19:43:37 UTC