Some comments on the Web Audio API Spec from Steven Yi on 2015-01-31 (public-audio@w3.org from January to March 2015)

From: Steven Yi <stevenyi@gmail.com>
Date: Sat, 31 Jan 2015 01:57:07 +0100
To: public-audio@w3.org
Message-ID: <CANtcCs6J1yXHHJ2iuDHX9yDskKjrWJ+ZHUUjq1yU0A4M456+JA@mail.gmail.com>
Hello All,

First, it was a great pleasure to be at the Web Audio conference. I
enjoyed the sessions and gigs and getting to the meet the other
members of community that I did.  Cheers to IRCAM and Mozilla for the
lovely conference!

That said, I have some comments and questions about the Web Audio API
and specification. (Note: these comments are in reference to the 06
January 2015 draft, found at
http://webaudio.github.io/web-audio-api/.)

#1 - The specification is not clear to me when a node become live. I
assume it is when a node is connected to the active part of the audio
graph that is "live" and processing. Since node creation and graph
assembly is done in the JS Main thread, it seems that the following
from "3.3 Example: Mixer with Send Busses", it possible that nodes
might get attached across buffers in the audio thread:

   compressor = context.createDynamicsCompressor();

    // Send1 effect
    reverb = context.createConvolver();
    // Convolver impulse response may be set here or later

    // Send2 effect
    delay = context.createDelay();

    // Connect final compressor to final destination
    compressor.connect(context.destination);

    // Connect sends 1 & 2 through effects to main mixer
    s1 = context.createGain();
    reverb.connect(s1);
    s1.connect(compressor);

    s2 = context.createGain();
    delay.connect(s2);
    s2.connect(compressor);

  For example, could it be the case that "s1.connect(compresor)" above
happens just before buffer n starts to generate, and
"s2.connect(compressor)" happens such that it starts in when buffer n
+ 1 is generating?

If this is the case, would connecting the compressor to the
context.destination at the end of the example, rather than the
beginning, guarantee that the graph of nodes connected to the
compressor are started at the same time?  If so, then maybe this
aspect of node graph creation could be clarified and the example in
3.3 updated so that the sub-graph of nodes is clearly formed before
attaching to the active audio-graph.

#2 - Following from #1, what would happen if one is dynamically
altering a graph to remove an intermediary node?  For example, lets
say one has a graph like:

   gain = contxt.createGainNode();
   compressor = context.createDynamicsCompressor();
   reverb = context.createConvolver();
   gain.connect(reverb);
   reverb.connect(compressor);
   compressor.connect(context.destination);

and later the user decides to remove the reverb with something like:

   reverb.disconnect();
   // gain.disconnect();
   gain.connect(compressor);

(Assuming the above uses a gain node as a stable node for other nodes
to attach to.) My question is: when does connect and disconnect
happen?  Does it happen at block boundaries?  I assume it must or a
graph can get in a bad state if the graph changes while a block is
being processed.

Also, without the gain.disconnect(), will there be a hidden reference
to the reverb from gain? (I guess a "connection" reference according
to 2.3.3). If so, this seems like it could be a source of a memory
leak (assuming that the above object references to reverb are all
cleared from the JS main thread side).

#3 -  In "2.3.2 Methods", for an AudioNode to connect to another audio
node, it is not clear whether fan-out/fan-in is supported.  The
documentation for connecting to AudioParams explicitly states that
this is supported.  Should the first connect() method documentation be
clarified for this when connecting to nodes?

#4 - Also in regards to 2.3.2, the API of disconnect() seems odd as it
does not mirror connect(). connect() is given an argument of what node
or audioParam to connect to.  disconnect() however does not have a
target argument. It's not clear then what this disconnects from. For
example, if I connect a node to two different nodes and also to
another node's parameter, then call disconnect, what happens?  As it
is now, it doesn't seem possible then to create a GUI editor where one
could connect the output of a node to multiple other nodes/params,
then click and disconnect a single connection.

#5 - In the music systems I've seen, event processing is done within
the audio-thread.  This generally happens for each buffer, something
like:

1. Process incoming messages
2. Process a priority queue of pending events
3. Handle audio input
4. Run processing graph for one block
5. Handle audio output

I'm familiar with this from Csound and SuperCollider's engines, as
well as the design in my own software synthesizer Pink. (Chuck's
design follow the same basic pattern above, but on a sample-by-sample
basis.)

As it is today, the Web Audio API does not have any kind of reified
event object.  One can schedule some things like automations via
param's setXXXatTime() methods and have that run within the time of
the audio engine, but there is nothing built-in for events in the Web
Audio API.

Now, I have no issues with the Web Audio API not having a concrete
event system, and think it should not have one, as people have
different notions and needs out of what is encoded in an event.
However, I think that there should be a way to create one's own event
system, one that is clocked to the same audio system clock (i.e. run
within the audio thread).

I was a bit concerned when at the conference there was mention of "A
Tale of Two Clocks".  The design of trying to reference two clocks can
not, by definition, allow for a queue of events to be processed
synchronously with audio. If one formalizes events processing
functions and audio processing functions as functions of time, by
having two clocks you get two different variables, ta and tb, which
are not equivalent unless the clocks are proven to advance at the same
exact rate (i.e. ta0 == tb0, ta1 == tb1, ... tan == tbn).  However,
the JS Main thread and audio thread are not run at the same rate, so
we can at best implement some kind of approximation, but it can not be
a formally correct solution.

Event processing in a thread other than the audio thread has problems.
One mentioned at the conference was what to do with offline rendering,
where the clock of an audio engine runs faster than realtime, and may
advance faster or slower in terms of wall-clock time while rendering,
depending on how heavy the processing needs of the graph is.  Second,
I seemed to remember hearing a problem during one of the concerts when
I turned off my phone's screen and I continued to hear audio but all
events stopped, then a number of events fired all at once when I
turned my screen back on. The piece used an event scheduling system
that ran in the JS Main thread. I assume this situation is similar to
what could happen with backgrounded tabs, but I'm not quite sure about
all this. Either way, I think there are real problems here that need
to be addressed.

This also leads to a bigger question: with Web Audio, if I run the
same project twice that uses an event system to reify graph
modifications in time (as events in audio engines are mostly used for,
i.e. alloc this graph of nodes and add to the live audio graph), will
I get the same result?  Assuming to use only referentially transparent
nodes (i.e. no random calculations), I believe the only way to
guarantee this is if the event system is processed as part of the
audio thread.

Now, what can a user do with Web Audio to create their own Event
system that is in sync with the audio thread?  Currently, there is the
ScriptProcessorNode.  Of course, the design of ScriptProcessorNode is
deeply flawed for all the reasons discussed at the conference
(Security, Inefficient due to context switching, potential for
breakups, etc.).  However, what it does do is allow for one to process
events in sync with the audio thread, allowing to build formally
correct audio systems where one processes event time according to the
same time as is used by the audio nodes. Additionally, according to
those events, one can dynamically modify the graph (i.e. add new
instances of a sub-graph of nodes to the live graph, representing a
"note"), via reference to other nodes and the audio context. So while
flawed in terms of performance and security, it does allow one to
build correct systems that generate consistent output.

My concern is that there was discussion of not only deprecating
ScriptProcessorNode, but removing it altogether.  I would have no
problems with this, except that from reading the current specification
for AudioWorker, I do not see how it would be possible to create an
event system with it.  While one can pass messages to and from an
AudioWorker, one has no access to the AudioContext. In that regards,
one can not say, within an AudioWorker, create new nodes and attach to
the context.destination. I am not very familiar with transferables and
what can be passed between the AudioWork and the JS Main thread via
postMessage, but I assume AudioNodes can not be made transferable.

At this point, I'm questioning what can be done. It seems
AudioWorker's design is not meant for event processing (fair enough),
and ScriptProcessor can only do this by accident and not design. Is
there any solution to this problem with the Web Audio API moving
forward?  For example, would this group be willing to consider
extending the API for non-audio nodes?  (Processing nodes?). If
processing nodes could be added that has a larger context than what is
proposed for AudioWorkGlobalContext--say, has access to the
AudioContext, and can modify the audio node graph dynamically--I could
see it as a solution to allow building higher level constructs like an
event system.

#6 - For the AudioWorker specification, I think it would be useful to
have clarification on when postMessage is processed.  In 2.11.1.2, it
has a link to "the algorithm defined by the Worker Specification".
That in turn mentions:

"The postMessage() method on DedicatedWorkerGlobalScope objects must
act as if, when invoked, it immediately invoked the method of the same
name on the port, with the same arguments, and returned the same
return value."

If it meant to be processed immediately, then this can cause problems
if the AudioWorker is already part of a live graph and values mutate
while an audio worker is processing a block. I think it would be good
to have clarification on this, perhaps with a recommendation that in
onaudioprocess functions, one should make a local copy of a value of a
mutable value and use that for the duration of onaudioprocess to get a
consistent result for the block.

#7 - Related to #6, I noticed in "2.11.3.1 A Bitcrusher Node", the
example uses a phaser variable that is scoped to the AudioWorker.  I
assume this would then be on the heap. This is perhaps more of general
JS question, but I normally see in block-based audio programming that
for a process() function, one generally copies any state variables of
a node/ugen/etc. to local variables, runs the audio for-loop with
local variable, then saves the state for the next run.  This is done
for performance (better locality, stack vs. heap access, better
compiler optimizations, etc.). I don't know much about JavaScript
implementations; can anyone comment if these kinds of optimizations
are effective in JS?  If so, the example might benefit from rewriting
and give some guidance. (i.e. phase and lastDataValue are copied to a
local var before the for-loop, and saved again after the for-loop, in
onaudioprocess).

Thanks!
steven
Received on Monday, 2 February 2015 15:30:04 UTC