Re: Audio Workers - please review from Jussi Kalliokoski on 2014-09-11 (public-audio@w3.org from July to September 2014)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Fri, 12 Sep 2014 00:27:06 +0300
To: Ehsan Akhgari <ehsan@mozilla.com>
Cc: Chris Wilson <cwilso@google.com>, Norbert Schnell <Norbert.Schnell@ircam.fr>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAJhzemXv8k-A5d1P99CiPFURxk=Bnry6XFPFNwuZa+70B=NL_g@mail.gmail.com>
On Thu, Sep 11, 2014 at 10:30 PM, Ehsan Akhgari <ehsan@mozilla.com> wrote:

> Interesting proposal, Jussi!
>
> I haven't spent enough time to digest it yet, but let's take a moment to
> talk about the goals that we're trying to achieve, and the use cases we
> want to address before discussing solutions.
>

 Given that I'm not up to date with the group's current use cases for the
AudioWorkerNode, I can only give the answer in the form of my own
motivation behind the proposal:

* Enable custom processing.
* Enable feature parity with native nodes. I think the only way to achieve
this is by making sure it's possible to self-host the native nodes with
equal behavior in every observable way. Why:
* Consistency with the native nodes. I feel it's important that custom
nodes can be made to feel like they belong in the API. This is for
developer intuitiveness and also:
* Ability to polyfill future features.
* Performance-parity with native nodes in all but language execution speed.
In order for the custom nodes to be actually used in the wild, we need to
make sure there's no observable delays at any point of the AudioWorker life
cycle, at least any delays that wouldn't be there with the native nodes.
Otherwise people are just going to feel like they need to do whatever it
takes to avoid the WorkerNodes. They do now, I know I do.

And achieving all these goals helps with the last one:
* Make sure people aren't faced with a binary choice: all custom processing
or all native processing? Because for example currently it's completely
unfeasible to do both without scoring 0/10 in user experience after
glitching the users' heads off.

Speaking of glitching, the glitching that would come with reinitializing
new workers all the time in the audio thread would be even worse than the
stuff you get now from ScriptProcessorNodes, since it would block the whole
audio thread. And once the glitching ends, you'll likely get a horrible
barrage of noise because the audio clock has advanced while no processing
has been done (although I hope that currently all the implementations have
safety measures from playing back sounds that are surely bound to cause ear
damage).


> For example, the proposal below seems to suggest that all of these events
> are fired on the same global, which means that the nodes can share state
> with each other through setting properties on the global, which means that
> the UA would be unable to parallelize without breaking content which relies
> on such shared state.
>

That is true of service workers as well. You can introduce global state,
but all the developer evangelism warns you not to, because it might break.

Evangelism isn't the only consolation though. The workers are not aware of
each other, and the only means of communication available is the
postMessage/onmessage through the AudioNodeHandle on the worker and the
AudioWorkerNode in the main thread. Now, while you can deliberately for
example count how many worker instances are in use by sending a message
containing a random identifier specific to the instance through all the
AudioNodeHandle instances and then see how many unique values you get on
the main thread, that is the only means by which means the workers can even
become aware of each other, and even that is far from breaking an
application that didn't try all the tricks in the book to violate the
advice given in the evangelism.


> Is there a list of goals and use cases for the worker nodes somewhere?
> Sorry if I have missed it, I have not been following the discussions at the
> working group very closely.
>

Actually I've been inactive for quite a while and if we have some kind of
document about this, I'd be happy to see it as well.


> Thanks!
>
> On Thu, Sep 11, 2014 at 1:50 PM, Jussi Kalliokoski <
> jussi.kalliokoski@gmail.com> wrote:
>
>> On Thu, Sep 11, 2014 at 7:11 PM, Chris Wilson <cwilso@google.com> wrote:
>>
>>> Actually, I believe I completely misspoke.  I believe postMessages are
>>> only dispatched from a thread when the originating thread "has time" - e.g.
>>> "The window.postMessage method, when called, causes a MessageEvent to be
>>> dispatched at the target window when any pending script that must be
>>> executed completes (e.g., remaining event handlers if window.postMessage is
>>> called from an event handler, previously-set pending timeouts, etc.)" (from
>>> https://developer.mozilla.org/en-US/docs/Web/API/Window.postMessage).
>>>  So these would process in order, and be dispatched at the same time.
>>>
>>
>> The important question is whether they fire before or after the
>> onaudioprocess. Currently that's undefined behavior and because of that
>> will likely be undeterministic.
>>
>> Actually, thinking about the mention of importScripts on this thread made
>> me wonder about the usability of the currently specced model. Let's say
>> there's a JS audio library that contains a comprehensive set of DSP tools:
>> oscillators, FFT, window functions, filters, time stretching, resampling. A
>> library like this could easily weigh around the same as jQuery. Now, you
>> make different kinds of custom nodes using this library, and use them in
>> the similar fire-and-forget way as you generally do with the native nodes.
>> Every time you create a new instance of a node like this, you fetch this
>> library (cache or not), parse it and execute it. This will amount to a huge
>> amount of wasted resources as well as creation delays (I'm not sure how
>> importScripts could even work in the WorkerNode). The effect is amplified
>> further when these nodes are compiled from another language to asm.js,
>> which at the moment tends to have rather heavy a footprint. And on top of
>> that, you have to create a new VM context, which can be both memory and CPU
>> intensive.
>>
>> This brings me back to my earlier suggestion of allowing one worker to
>> manage multiple nodes - this doesn't actually require very radical changes,
>> while it does steer us further away from being compliant with normal
>> Workers. Here's one proposal, that is a bit more radical but I think
>> provides the necessary features as well as some little nitpick
>> comprehensibility fixes on the API design.
>>
>> interface AudioNodeHandle {
>>     attribute EventHandler onaudioprocess;
>>     attribute EventHandler onmessage;
>>     void postMessage (any message, optional sequence<Transferable>
>> transfer);
>>     void terminate();
>> }
>>
>> interface AudioWorkerGlobalScope {
>>     attribute EventHandler onaudionodecreated;
>>     attribute EventHandler onmessage;
>> }
>>
>> interface AudioProcessEvent : Event {
>>     readonly attribute double playbackTime;
>>     readonly attribute Float32Array[] inputBuffers;
>>     readonly attribute Float32Array[] outputBuffers;
>>     readonly attribute object parameters;
>>     readonly attribute float sampleRate;
>> }
>>
>> interface AudioNodeCreatedEvent : Event {
>>     readonly AudioNodeHandle node;
>>     readonly object data;
>> }
>>
>> partial interface AudioContext {
>>     AudioWorker createAudioWorker(DOMString scriptURL);
>>     AudioWorkerNode createAudioWorkerNode(AudioWorker audioWorker,
>> optional object options);
>> }
>>
>> interface AudioWorker {
>>     attribute EventHandler onmessage;
>>     void postMessage (any message, optional sequence<Transferable>
>> transfer);
>> }
>>
>> interface AudioWorkerNode {
>>     attribute EventHandler onmessage;
>>     readonly attribute object parameters; // a mapping of names to
>> AudioParam instances. Ideally frozen. Could be a Map-like as well with
>> readonly semantics.
>>     void postMessage (any message, optional sequence<Transferable>
>> transfer);
>> }
>>
>> (I also moved the sampleRate to the AudioProcessEvent as I think this
>> will be more future-proof if we in the future figure out a way to allow
>> different parts of the graph be running at different sample rates).
>>
>> Now with this model, you could do the setup once and then be able to just
>> spawn instances of nodes with a massively smaller startup cost.
>>
>> In case UAs decide to implement parallelization, they can store the
>> scriptURL of the AudioWorker and fork a new worker when necessary. This
>> makes the parallelization observable but I don't see any new issues with
>> that.
>>
>> The nit-picky API "improvement" I made with the createAudioWorkerNode was
>> that it takes an options object, which contains optional values for
>> numberOfInputChannels, numberOfOutputChannels (named parameters are easier
>> to understand at a glance than just numbers), as well as a `parameters`
>> object that has a name -> initialValue mapping, and an arbitrary data
>> object to send additional initialization information to the worker, such as
>> what kind of a Node it is (one worker could host multiple types of nodes).
>> This would also prevent manipulating the list of audioparameters after
>> creation, just like native nodes don't add or remove parameters on
>> themselves after creation. A code example to clarify the usage:
>>
>> var customNode = context.createAudioWorkerNode(audioWorker, {
>>     numberOfInputChannels: 1,
>>     numberOfOutputChannels: 1,
>>     parameters: {
>>         angle: 1,
>>         density: 5.2,
>>     },
>>     data: {
>>         type: "BlackHoleGenerator",
>>     },
>> });
>>
>> I think since the whole point of this worker thing is performance, we
>> shouldn't ignore startup performance, otherwise in a lot of cases it will
>> probably be more efficient to have just one audioworker do all the
>> processing and not take advantage of the graph at all, due to the high cost
>> of making new nodes. We probably all agree that leading developers to that
>> conclusion would be counterproductive.
>>
>>
>>> On Thu, Sep 11, 2014 at 5:55 PM, Jussi Kalliokoski <
>>> jussi.kalliokoski@gmail.com> wrote:
>>>
>>>> On Thu, Sep 11, 2014 at 5:51 PM, Chris Wilson <cwilso@google.com>
>>>> wrote:
>>>>
>>>>> I don't know how it is possible to do this, unless all WA changes are
>>>>> batched up into a single postMessage.
>>>>>
>>>>
>>>> I think that would be beneficial, yes. The same applies to native nodes
>>>> - in most web platform features (in fact I can't think of one exception)
>>>> the things you do in a single "job" get *observably* applied at the same
>>>> time, e.g. with WebGL you don't get half the scene rendered in one frame
>>>> and the rest in the next one. This is the point argued in earlier
>>>> discussions some time ago as well: the state of things shouldn't change on
>>>> its own during a job.
>>>>
>>>> As for the creation of the audio context, I think the easiest solution
>>>> is that we specify that the context starts playback only after the job that
>>>> created it has yielded, batching up all the creation-time instructions
>>>> before starting playback.
>>>>
>>>>
>>>>>
>>>>> On Thu, Sep 11, 2014 at 4:41 PM, Norbert Schnell <
>>>>> Norbert.Schnell@ircam.fr> wrote:
>>>>>
>>>>>> On 11 sept. 2014, at 15:41, Chris Wilson <cwilso@google.com> wrote:
>>>>>> > I think this is actually indefinite in the spec today - and needs
>>>>>> to be.  "start(0)" (in fact, any "start(n)" where n is <
>>>>>> audiocontext.currentTime) is catch as catch can; thread context switch may
>>>>>> happen, and that needs to be okay.  Do we guarantee that:
>>>>>> >
>>>>>> > node1.start(0);
>>>>>> > ...some really time-expensive processing steps...
>>>>>> > node2.start(0);
>>>>>> > will have synchronized start times?
>>>>>>
>>>>>> IMHO, it would be rather important that these two really go off at
>>>>>> the same time :
>>>>>>
>>>>>> var now = audioContext.currentTime;
>>>>>> node1.start(now);
>>>>>> ...some really time-expensive
>>>>>> node2.start(now);
>>>>>>
>>>>>> ... unless we can well define what "really time-expensive" means and
>>>>>> the ability to avoid it.
>>>>>> Is that actually case? I was never sure about this...
>>>>>>
>>>>>> Evidently it could be sympathetic if everything <
>>>>>> audioContext.currentTime could just be clipped and behave accordingly. That
>>>>>> would make things pretty clear and 0 synonymous to "now", which feels right.
>>>>>>
>>>>>> Norbert
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Ehsan
>
Received on Thursday, 11 September 2014 21:27:34 UTC