Re: Some comments on the Web Audio API Spec

Steven, thanks for the email, my answers are inline.

On Sat, Jan 31, 2015 at 1:57 AM, Steven Yi <stevenyi@gmail.com> wrote:
> #1 - The specification is not clear to me when a node become live. I
> assume it is when a node is connected to the active part of the audio
> graph that is "live" and processing. Since node creation and graph
> assembly is done in the JS Main thread, it seems that the following
> from "3.3 Example: Mixer with Send Busses", it possible that nodes
> might get attached across buffers in the audio thread:
>
>    compressor = context.createDynamicsCompressor();
>
>     // Send1 effect
>     reverb = context.createConvolver();
>     // Convolver impulse response may be set here or later
>
>     // Send2 effect
>     delay = context.createDelay();
>
>     // Connect final compressor to final destination
>     compressor.connect(context.destination);
>
>     // Connect sends 1 & 2 through effects to main mixer
>     s1 = context.createGain();
>     reverb.connect(s1);
>     s1.connect(compressor);
>
>     s2 = context.createGain();
>     delay.connect(s2);
>     s2.connect(compressor);
>
>   For example, could it be the case that "s1.connect(compresor)" above
> happens just before buffer n starts to generate, and
> "s2.connect(compressor)" happens such that it starts in when buffer n
> + 1 is generating?
>
> If this is the case, would connecting the compressor to the
> context.destination at the end of the example, rather than the
> beginning, guarantee that the graph of nodes connected to the
> compressor are started at the same time?  If so, then maybe this
> aspect of node graph creation could be clarified and the example in
> 3.3 updated so that the sub-graph of nodes is clearly formed before
> attaching to the active audio-graph.

I don't think this is specified. What happens in Firefox is that we
queue all operations on the graph during a stable state, and then we
run them as events in the audio thread at the beginning of the
callback, so it's impossible for the scenario you mention to happen:
events that are produced the same js event loop run are going to be
reflected on the audio thread in the same callback.

We should specify this, because it's certainly observable from
content. Also, I don't know what other implementations are doing: this
is a trade off before consistency and latency for long-running js
functions.

>
> #2 - Following from #1, what would happen if one is dynamically
> altering a graph to remove an intermediary node?  For example, lets
> say one has a graph like:
>
>    gain = contxt.createGainNode();
>    compressor = context.createDynamicsCompressor();
>    reverb = context.createConvolver();
>    gain.connect(reverb);
>    reverb.connect(compressor);
>    compressor.connect(context.destination);
>
> and later the user decides to remove the reverb with something like:
>
>    reverb.disconnect();
>    // gain.disconnect();
>    gain.connect(compressor);
>
> (Assuming the above uses a gain node as a stable node for other nodes
> to attach to.) My question is: when does connect and disconnect
> happen?  Does it happen at block boundaries?  I assume it must or a
> graph can get in a bad state if the graph changes while a block is
> being processed.

In Firefox, this happens at audio callback boundaries, before
processing the audio, and at block boundaries (because we align audio
callbacks on block boundaries). I don't know what Blink/Webkit do, but
this is something we need to spec.

>
> Also, without the gain.disconnect(), will there be a hidden reference
> to the reverb from gain? (I guess a "connection" reference according
> to 2.3.3). If so, this seems like it could be a source of a memory
> leak (assuming that the above object references to reverb are all
> cleared from the JS main thread side).
>
> #3 -  In "2.3.2 Methods", for an AudioNode to connect to another audio
> node, it is not clear whether fan-out/fan-in is supported.  The
> documentation for connecting to AudioParams explicitly states that
> this is supported.  Should the first connect() method documentation be
> clarified for this when connecting to nodes?

In 2.3:

>>  An output may connect to one or more AudioNode inputs, thus fan-out is supported. An input initially
>> has no connections, but may be connected from one or more AudioNode outputs, thus fan-in is
>> supported. When the connect() method is called to connect an output of an AudioNode to an input of an
>> AudioNode, we call that a connection to the input.

so I believe this is specced.

> #4 - Also in regards to 2.3.2, the API of disconnect() seems odd as it
> does not mirror connect(). connect() is given an argument of what node
> or audioParam to connect to.  disconnect() however does not have a
> target argument. It's not clear then what this disconnects from. For
> example, if I connect a node to two different nodes and also to
> another node's parameter, then call disconnect, what happens?  As it
> is now, it doesn't seem possible then to create a GUI editor where one
> could connect the output of a node to multiple other nodes/params,
> then click and disconnect a single connection.

This is being worked on in https://github.com/WebAudio/web-audio-api/issues/6

>
> #5 - In the music systems I've seen, event processing is done within
> the audio-thread.  This generally happens for each buffer, something
> like:
>
> 1. Process incoming messages
> 2. Process a priority queue of pending events
> 3. Handle audio input
> 4. Run processing graph for one block
> 5. Handle audio output
>
> I'm familiar with this from Csound and SuperCollider's engines, as
> well as the design in my own software synthesizer Pink. (Chuck's
> design follow the same basic pattern above, but on a sample-by-sample
> basis.)

For reference, this is roughly what Firefox do, but we process message
at the beginning of the system audio callback (and not at block
boundaries).

> As it is today, the Web Audio API does not have any kind of reified
> event object.  One can schedule some things like automations via
> param's setXXXatTime() methods and have that run within the time of
> the audio engine, but there is nothing built-in for events in the Web
> Audio API.

Can you clarify what "built-in [...] events in the Web Audio API" are
? Things like "connect" "disconnect" "node.buffer = somebuffer" ?

> Now, I have no issues with the Web Audio API not having a concrete
> event system, and think it should not have one, as people have
> different notions and needs out of what is encoded in an event.
> However, I think that there should be a way to create one's own event
> system, one that is clocked to the same audio system clock (i.e. run
> within the audio thread).
>
> I was a bit concerned when at the conference there was mention of "A
> Tale of Two Clocks".  The design of trying to reference two clocks can
> not, by definition, allow for a queue of events to be processed
> synchronously with audio. If one formalizes events processing
> functions and audio processing functions as functions of time, by
> having two clocks you get two different variables, ta and tb, which
> are not equivalent unless the clocks are proven to advance at the same
> exact rate (i.e. ta0 == tb0, ta1 == tb1, ... tan == tbn).  However,
> the JS Main thread and audio thread are not run at the same rate, so
> we can at best implement some kind of approximation, but it can not be
> a formally correct solution.

A solution exists, which is to have a new kind of worker that can
mutate the graph on the audio thread. I'm not sure we have explored
it. I recall talking about making AudioContext available from a
WebWorker, but that's it.

> Event processing in a thread other than the audio thread has problems.
> One mentioned at the conference was what to do with offline rendering,
> where the clock of an audio engine runs faster than realtime, and may
> advance faster or slower in terms of wall-clock time while rendering,
> depending on how heavy the processing needs of the graph is.  Second,
> I seemed to remember hearing a problem during one of the concerts when
> I turned off my phone's screen and I continued to hear audio but all
> events stopped, then a number of events fired all at once when I
> turned my screen back on. The piece used an event scheduling system
> that ran in the JS Main thread. I assume this situation is similar to
> what could happen with backgrounded tabs, but I'm not quite sure about
> all this. Either way, I think there are real problems here that need
> to be addressed.

For sure. There is some discussion starting to happen about a "Media
Focus" API, that would help. Having suspend/resume/close also helps,
as authors can listen for events that happen when a page is
backgrounded/screen turns off, and implement something directly in the
app. There has also been discussion of being able to schedule "events"
on the OfflineAudioContext. I think both context should be unified by
the same mecanism.

Whether we want to allow some applications to run, fire events at a
high rate in a background tab is another question.

> This also leads to a bigger question: with Web Audio, if I run the
> same project twice that uses an event system to reify graph
> modifications in time (as events in audio engines are mostly used for,
> i.e. alloc this graph of nodes and add to the live audio graph), will
> I get the same result?  Assuming to use only referentially transparent
> nodes (i.e. no random calculations), I believe the only way to
> guarantee this is if the event system is processed as part of the
> audio thread.

That's correct, because the main thread can be delayed by an arbitrary
amount of time by things authors can't predict (garbage collection,
for example), and because the audio thread will not be affected by
those "things" (i.e. runs in some sort of soft real-time).

> Now, what can a user do with Web Audio to create their own Event
> system that is in sync with the audio thread?  Currently, there is the
> ScriptProcessorNode.  Of course, the design of ScriptProcessorNode is
> deeply flawed for all the reasons discussed at the conference
> (Security, Inefficient due to context switching, potential for
> breakups, etc.).  However, what it does do is allow for one to process
> events in sync with the audio thread, allowing to build formally
> correct audio systems where one processes event time according to the
> same time as is used by the audio nodes. Additionally, according to
> those events, one can dynamically modify the graph (i.e. add new
> instances of a sub-graph of nodes to the live graph, representing a
> "note"), via reference to other nodes and the audio context. So while
> flawed in terms of performance and security, it does allow one to
> build correct systems that generate consistent output.

This is a false solution, and is in fact no better than using
setInterval + AudioContext.currentTime. ScriptProcessorNode is
implemented either as a double buffer or as a queue (depending on the
browser, both techniques have pros and cons). The fact that it keeps
running even in a tab that is in the background (does it really? I
haven't tested) is a bug and authors should not rely on that.

I'm curious why you think having ScriptProcessorNode is better than
set{Timeout,Interval} or requestAnimationFrame, maybe something is
badly worded in the spec ?

> My concern is that there was discussion of not only deprecating
> ScriptProcessorNode, but removing it altogether.  I would have no
> problems with this, except that from reading the current specification
> for AudioWorker, I do not see how it would be possible to create an
> event system with it.  While one can pass messages to and from an
> AudioWorker, one has no access to the AudioContext. In that regards,
> one can not say, within an AudioWorker, create new nodes and attach to
> the context.destination. I am not very familiar with transferables and
> what can be passed between the AudioWork and the JS Main thread via
> postMessage, but I assume AudioNodes can not be made transferable.
>
> At this point, I'm questioning what can be done. It seems
> AudioWorker's design is not meant for event processing (fair enough),
> and ScriptProcessor can only do this by accident and not design. Is
> there any solution to this problem with the Web Audio API moving
> forward?  For example, would this group be willing to consider
> extending the API for non-audio nodes?  (Processing nodes?). If
> processing nodes could be added that has a larger context than what is
> proposed for AudioWorkGlobalContext--say, has access to the
> AudioContext, and can modify the audio node graph dynamically--I could
> see it as a solution to allow building higher level constructs like an
> event system.

I think you are right on this. This could be a great addition to the
spec, as Joe has noted.

> #6 - For the AudioWorker specification, I think it would be useful to
> have clarification on when postMessage is processed.  In 2.11.1.2, it
> has a link to "the algorithm defined by the Worker Specification".
> That in turn mentions:
>
> "The postMessage() method on DedicatedWorkerGlobalScope objects must
> act as if, when invoked, it immediately invoked the method of the same
> name on the port, with the same arguments, and returned the same
> return value."
>
> If it meant to be processed immediately, then this can cause problems
> if the AudioWorker is already part of a live graph and values mutate
> while an audio worker is processing a block. I think it would be good
> to have clarification on this, perhaps with a recommendation that in
> onaudioprocess functions, one should make a local copy of a value of a
> mutable value and use that for the duration of onaudioprocess to get a
> consistent result for the block.

Workers have their event loops, and I assume the AudioWorkerNode event
looped is clocked off the system audio callback, so this would happen
either at block boundaries or system audio callback boundaries. In any
case, this need to be specced.

> #7 - Related to #6, I noticed in "2.11.3.1 A Bitcrusher Node", the
> example uses a phaser variable that is scoped to the AudioWorker.  I
> assume this would then be on the heap. This is perhaps more of general
> JS question, but I normally see in block-based audio programming that
> for a process() function, one generally copies any state variables of
> a node/ugen/etc. to local variables, runs the audio for-loop with
> local variable, then saves the state for the next run.  This is done
> for performance (better locality, stack vs. heap access, better
> compiler optimizations, etc.). I don't know much about JavaScript
> implementations; can anyone comment if these kinds of optimizations
> are effective in JS?  If so, the example might benefit from rewriting
> and give some guidance. (i.e. phase and lastDataValue are copied to a
> local var before the for-loop, and saved again after the for-loop, in
> onaudioprocess).

If you're concerned about cache locality for different properties,
reimplementing your own heap in an expando on the global scope of the
worker is trivial (it's just a matter of allocating a bit Uint8Array
on the worker creation, and using a DataView to access the data). This
is an approach that has been used successfully on a lot of of
applications (3d/physics engine/codecs written in js come to mind, as
well as everything that emscripten outputs).

Copying local and global data is an approach that should work as well,
but I haven't tried (and I don't work on JITs, and those things are
somewhat magical).

In any case, that's not really related with Web Audio API, and I think
Chris' approach with those example were not to be as fast as possible,
but to illustrate the AudioWorker's normative text.

In any case, thank you again for outlining a lot of points the working
group needs to work on.

Cheers,
Paul.

Received on Monday, 2 February 2015 16:40:06 UTC