Re: Audio Workers - please review from Ehsan Akhgari on 2014-09-11 (public-audio@w3.org from July to September 2014)

From: Ehsan Akhgari <ehsan@mozilla.com>
Date: Thu, 11 Sep 2014 15:30:32 -0400
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: Chris Wilson <cwilso@google.com>, Norbert Schnell <Norbert.Schnell@ircam.fr>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CANTur_4MkJH7_0XOBRWM3h9HgC073VzgA7gTzUARTeMOzjYxQQ@mail.gmail.com>
Interesting proposal, Jussi!

I haven't spent enough time to digest it yet, but let's take a moment to
talk about the goals that we're trying to achieve, and the use cases we
want to address before discussing solutions.

For example, the proposal below seems to suggest that all of these events
are fired on the same global, which means that the nodes can share state
with each other through setting properties on the global, which means that
the UA would be unable to parallelize without breaking content which relies
on such shared state.

Is there a list of goals and use cases for the worker nodes somewhere?
Sorry if I have missed it, I have not been following the discussions at the
working group very closely.

Thanks!

On Thu, Sep 11, 2014 at 1:50 PM, Jussi Kalliokoski <
jussi.kalliokoski@gmail.com> wrote:

> On Thu, Sep 11, 2014 at 7:11 PM, Chris Wilson <cwilso@google.com> wrote:
>
>> Actually, I believe I completely misspoke.  I believe postMessages are
>> only dispatched from a thread when the originating thread "has time" - e.g.
>> "The window.postMessage method, when called, causes a MessageEvent to be
>> dispatched at the target window when any pending script that must be
>> executed completes (e.g., remaining event handlers if window.postMessage is
>> called from an event handler, previously-set pending timeouts, etc.)" (from
>> https://developer.mozilla.org/en-US/docs/Web/API/Window.postMessage).
>>  So these would process in order, and be dispatched at the same time.
>>
>
> The important question is whether they fire before or after the
> onaudioprocess. Currently that's undefined behavior and because of that
> will likely be undeterministic.
>
> Actually, thinking about the mention of importScripts on this thread made
> me wonder about the usability of the currently specced model. Let's say
> there's a JS audio library that contains a comprehensive set of DSP tools:
> oscillators, FFT, window functions, filters, time stretching, resampling. A
> library like this could easily weigh around the same as jQuery. Now, you
> make different kinds of custom nodes using this library, and use them in
> the similar fire-and-forget way as you generally do with the native nodes.
> Every time you create a new instance of a node like this, you fetch this
> library (cache or not), parse it and execute it. This will amount to a huge
> amount of wasted resources as well as creation delays (I'm not sure how
> importScripts could even work in the WorkerNode). The effect is amplified
> further when these nodes are compiled from another language to asm.js,
> which at the moment tends to have rather heavy a footprint. And on top of
> that, you have to create a new VM context, which can be both memory and CPU
> intensive.
>
> This brings me back to my earlier suggestion of allowing one worker to
> manage multiple nodes - this doesn't actually require very radical changes,
> while it does steer us further away from being compliant with normal
> Workers. Here's one proposal, that is a bit more radical but I think
> provides the necessary features as well as some little nitpick
> comprehensibility fixes on the API design.
>
> interface AudioNodeHandle {
>     attribute EventHandler onaudioprocess;
>     attribute EventHandler onmessage;
>     void postMessage (any message, optional sequence<Transferable>
> transfer);
>     void terminate();
> }
>
> interface AudioWorkerGlobalScope {
>     attribute EventHandler onaudionodecreated;
>     attribute EventHandler onmessage;
> }
>
> interface AudioProcessEvent : Event {
>     readonly attribute double playbackTime;
>     readonly attribute Float32Array[] inputBuffers;
>     readonly attribute Float32Array[] outputBuffers;
>     readonly attribute object parameters;
>     readonly attribute float sampleRate;
> }
>
> interface AudioNodeCreatedEvent : Event {
>     readonly AudioNodeHandle node;
>     readonly object data;
> }
>
> partial interface AudioContext {
>     AudioWorker createAudioWorker(DOMString scriptURL);
>     AudioWorkerNode createAudioWorkerNode(AudioWorker audioWorker,
> optional object options);
> }
>
> interface AudioWorker {
>     attribute EventHandler onmessage;
>     void postMessage (any message, optional sequence<Transferable>
> transfer);
> }
>
> interface AudioWorkerNode {
>     attribute EventHandler onmessage;
>     readonly attribute object parameters; // a mapping of names to
> AudioParam instances. Ideally frozen. Could be a Map-like as well with
> readonly semantics.
>     void postMessage (any message, optional sequence<Transferable>
> transfer);
> }
>
> (I also moved the sampleRate to the AudioProcessEvent as I think this will
> be more future-proof if we in the future figure out a way to allow
> different parts of the graph be running at different sample rates).
>
> Now with this model, you could do the setup once and then be able to just
> spawn instances of nodes with a massively smaller startup cost.
>
> In case UAs decide to implement parallelization, they can store the
> scriptURL of the AudioWorker and fork a new worker when necessary. This
> makes the parallelization observable but I don't see any new issues with
> that.
>
> The nit-picky API "improvement" I made with the createAudioWorkerNode was
> that it takes an options object, which contains optional values for
> numberOfInputChannels, numberOfOutputChannels (named parameters are easier
> to understand at a glance than just numbers), as well as a `parameters`
> object that has a name -> initialValue mapping, and an arbitrary data
> object to send additional initialization information to the worker, such as
> what kind of a Node it is (one worker could host multiple types of nodes).
> This would also prevent manipulating the list of audioparameters after
> creation, just like native nodes don't add or remove parameters on
> themselves after creation. A code example to clarify the usage:
>
> var customNode = context.createAudioWorkerNode(audioWorker, {
>     numberOfInputChannels: 1,
>     numberOfOutputChannels: 1,
>     parameters: {
>         angle: 1,
>         density: 5.2,
>     },
>     data: {
>         type: "BlackHoleGenerator",
>     },
> });
>
> I think since the whole point of this worker thing is performance, we
> shouldn't ignore startup performance, otherwise in a lot of cases it will
> probably be more efficient to have just one audioworker do all the
> processing and not take advantage of the graph at all, due to the high cost
> of making new nodes. We probably all agree that leading developers to that
> conclusion would be counterproductive.
>
>
>> On Thu, Sep 11, 2014 at 5:55 PM, Jussi Kalliokoski <
>> jussi.kalliokoski@gmail.com> wrote:
>>
>>> On Thu, Sep 11, 2014 at 5:51 PM, Chris Wilson <cwilso@google.com> wrote:
>>>
>>>> I don't know how it is possible to do this, unless all WA changes are
>>>> batched up into a single postMessage.
>>>>
>>>
>>> I think that would be beneficial, yes. The same applies to native nodes
>>> - in most web platform features (in fact I can't think of one exception)
>>> the things you do in a single "job" get *observably* applied at the same
>>> time, e.g. with WebGL you don't get half the scene rendered in one frame
>>> and the rest in the next one. This is the point argued in earlier
>>> discussions some time ago as well: the state of things shouldn't change on
>>> its own during a job.
>>>
>>> As for the creation of the audio context, I think the easiest solution
>>> is that we specify that the context starts playback only after the job that
>>> created it has yielded, batching up all the creation-time instructions
>>> before starting playback.
>>>
>>>
>>>>
>>>> On Thu, Sep 11, 2014 at 4:41 PM, Norbert Schnell <
>>>> Norbert.Schnell@ircam.fr> wrote:
>>>>
>>>>> On 11 sept. 2014, at 15:41, Chris Wilson <cwilso@google.com> wrote:
>>>>> > I think this is actually indefinite in the spec today - and needs to
>>>>> be.  "start(0)" (in fact, any "start(n)" where n is <
>>>>> audiocontext.currentTime) is catch as catch can; thread context switch may
>>>>> happen, and that needs to be okay.  Do we guarantee that:
>>>>> >
>>>>> > node1.start(0);
>>>>> > ...some really time-expensive processing steps...
>>>>> > node2.start(0);
>>>>> > will have synchronized start times?
>>>>>
>>>>> IMHO, it would be rather important that these two really go off at the
>>>>> same time :
>>>>>
>>>>> var now = audioContext.currentTime;
>>>>> node1.start(now);
>>>>> ...some really time-expensive
>>>>> node2.start(now);
>>>>>
>>>>> ... unless we can well define what "really time-expensive" means and
>>>>> the ability to avoid it.
>>>>> Is that actually case? I was never sure about this...
>>>>>
>>>>> Evidently it could be sympathetic if everything <
>>>>> audioContext.currentTime could just be clipped and behave accordingly. That
>>>>> would make things pretty clear and 0 synonymous to "now", which feels right.
>>>>>
>>>>> Norbert
>>>>
>>>>
>>>>
>>>
>>
>


-- 
Ehsan
Received on Thursday, 11 September 2014 19:31:45 UTC