Re: Web Audio API Proposal from Jer Noble on 2010-06-16 (public-xg-audio@w3.org from June 2010)

From: Jer Noble <jer.noble@apple.com>
Date: Tue, 15 Jun 2010 17:20:49 -0700
To: public-xg-audio@w3.org
Message-id: <12CD0F76-6A3A-4B23-B0FF-C640D8E7E0D4@apple.com>
On Jun 15, 2010, at 3:51 PM, Chris Rogers wrote:

> Hi Jer, thanks for your comments.  I'll try to address the points you bring up:
> 
>  
> Hi Chris,
> 
> I'm in the midst of reviewing your spec, and I have a few comments and suggestions:
> 
> Ownership
> 
> Building the concept of lifetime-management into the API for AudioNodes seems unnecessary.  Each language has its own lifetime-management concepts, and in at least one case (JavaScript) the language lifetime-management will directly contradict the "ownership" model defined here.  For example, in JavaScript the direction of the "owner" reference implies that the AudioNode owns its "owner", not vice versa.
> 
> I think the idea of ownership is important and I'll try to explain why.  There's a difference between the javascript object (the AudioNode) and its underlying C++ object which implements its behavior.  The ownership for the javascript object itself behaves exactly the same as other javascript objects with reference counting and garbage collection.  However, the underlying/backing C++ object may (in some cases) persist after the javascript object no longer exists.  For example, consider the simple case of triggering a sound to play with the following javascript:
> 
>             function playSound() {
>                 var source = context.createBufferSource();
>                 source.buffer = dogBarkingBuffer;
>                 source.connect(context.output);
>                 source.noteOn(0);
>             }
> The javascript object source may be garbage collected immediately after playSound() is called, but the underlying C++ object representing it may very well still be connected to the rendering graph generating the sound of the barking dog.  At some later time when the sound has finished playing, it will automatically be removed from the rendering graph in the realtime thread (which is running asynchronously from the javascript thread).  So, strictly speaking the idea of ownership comes into play more at the level of the underlying C++ objects and not the javascript objects themselves.  If you keep these ideas in mind while looking at my dynamic lifetime example in the specification, maybe things will make a bit more sense.

Even in that case, the "ownership" seems like an underlying implementation detail.  If the C++ object can live on after the JavaScript GC has "deleted" the JS object, then is the lifetime management concept of the "owner" (as exposed in JavaScript) really necessary?

Because it seems like in JavaScript, you already have the ability to create one-shot, self-destructing AudioNodes, and the concept of "ownership" is as easy to implement as adding a global "var" pointing to an AudioNode.  In fact, without the "owner" concept, your dynamic lifetime example would work exactly the same:

> function playSound() {
>     var oneShotSound = context.createBufferSource();
>     oneShotSound.buffer = dogBarkingBuffer;
> 
>     // Pass the oneShotSound as the owner so the filter, panner, 
>     // and mixer input will go away when the sound is done. 
>     var lowpass = context.createLowPass2Filter(); // no owner
>     var panner = context.createPanner(); // no owner
>     var mixerInput2 = mixer.createInput(); // no owner
>     
>     // Make connections 
>     oneShotSound.connect(lowpass);
>     lowpass.connect(panner);
>     panner.connect(mixerInput2); // this used to read: panner.connect(mixer)
> 
>     panner.listener = listener;
> 
>     oneShotSound.noteOn(0.75);
> }


I've modified the example to remove the "owner" params to the constructor functions.  At the point where "oneShotSound.noteOn(0.75)" is called, there is a local reference to oneShotSound, lowpass, panner, mixer, and mixerInput2.  Once playSound() returns, those references disappear.  oneShotSound could be immediately GC'd, but it seems to make more sense that oneShotSound holds a reference to itself as long as it's playing (or is scheduled to play).

At some time in the future, the scheduled noteOn() finishes.  It releases the reference to itself, and thus no one has a reference to oneShotSound any longer, so it is GC'd.  oneShotSound was the only holder of a reference to lowPass, so lowPass is GC'd.  lowPass was the only holder of a reference to panner, so panner is GC'd.  And so on.

The end result is that all the filters and sources created inside playSound() are removed from the graph as soon as oneShotSound finishes playing.  Which is exactly the same behavior as when the filters have an explicit owner.  So, I don't see that exposing an "owner" property adds any functionality.

> 
> Additionally, it seems that it's currently impossible to change the "owner" of an AudioNode after that node has been created. Was the "owner" attribute left out of the AudioNode API purposefully?
> 
> owner could be added in as a read-only attribute, but I think it is not the kind of thing which should change after the fact of creating the object.
> Multiple Outputs
> 
> While there is an explicit AudioMixerNode, there's no equivalent AudioSplitterNode, and thus no explicit way to mux the output of one node to multiple inputs.  
> 
> It isn't necessary to have an AudioSplitterNode because it's possible to connect an output to multiple inputs directly (this is called fanout).  You may be thinking in terms  AudioUnits which require an explicit splitter.  I remember when we made that design decision with AudioUnits, but it is not a problem here.
> 
> So fanout from an output to multiple inputs is supported without fuss or muss.

That seems reasonable.  The spec should be updated to specifically call that out, since it confused the heck out of me.  However, see the next comment:

> In the sample code attached to Section 17, a single source (e.g. source1) is connected to multiple inputs, merely by calling "connect()" multiple times with different input nodes.  This doesn't match the AudioNode API, where the "connect()" function takes three parameters, the input node, the output index, and  the input index.  However, I find the sample code to be a much more natural and easier to use API.  Perhaps you may want to consider adopting this model for multiple inputs and outputs everywhere.
> 
> Maybe I should change the API description to be more explicit here, but the sample code does match the API because the output and input parameters are optional and default to 0.

Okay then, but if every output is capable of connecting to multiple inputs, why would you need multiple outputs?  Will any AudioNode ever have a "numberOfOutputs" > 1, and if so, what functionality does that provide above and beyond a single, fanout output?

>  
> Multiple Inputs
> 
> This same design could be applied to multiple inputs, as in the case with the mixers.  Instead of manually creating inputs, they could also be created dynamically, per-connection.
> 
> There is an explicit class, AudioMixerNode, which creates AudioMixerInputNodes,  demuxes their outputs together, and adjusts the final output gain.  It's somewhat strange that the AudioMixerNode can create AudioMixerInputNodes; that seems to be the responsibility of the AudioContext.  And it seems that this section could be greatly simplified by dynamically creating inputs. 
> 
> Let me throw out another idea.  AudioMixerNode and AudioMixerInputNode would be replaced by an AudioGainNode.  Every AudioNode would be capable of becoming an audio mixer by virtue of dynamically-created demuxing inputs.  The API would build upon the revised AudioNode above:
> 
>> interface AudioGainNode : AudioNode
> 
>> {
> 
>>         AudioGain gain;
> 
>>         void addGainContribution(in AudioGain);
> 
>> }
> 
> 
> The sample code in Section 17 would then go from:
> 
>>     mainMixer = context.createMixer();
>>     send1Mixer = context.createMixer();
>>     send2Mixer = context.createMixer();
> 
>>     g1_1 = mainMixer.createInput(source1);
>>     g2_1 = send1Mixer.createInput(source1);
>>     g3_1 = send2Mixer.createInput(source1);
>>     source1.connect(g1_1);
>>     source1.connect(g2_1);
>>     source1.connect(g3_1);
> 
> to:
> 
>>     mainMixer = context.createGain();
>>     send1Mixer = context.createGain();
>>     send2Mixer = context.createGain();
> 
>>     source2.connect(mainMixer);
>>     source2.connect(send1Mixer);
>>     source2.connect(send2Mixer);
> 
> Per-input gain could be achieved by adding an inline AudioGainNode between a source output and its demuxing input node:
> 
>>     var g1_1 = context.createGain(); 
> 
>>     source2.connect(g1_1);
> 
>>     g1_1.connect(mainMixer);
>>     g1_1.gain.value = 0.5;
> 
> If the default constructor for AudioNodes is changed from "in AudioNode owner" to "in AudioNode input", then a lot of these examples can be cleaned up and shortened.  That's just syntactic sugar, however. :)
> 
> It doesn't look like it actually shortens the code to me.  And I'm not sure we can get rid of the idea of owner due to the dynamic lifetime issues I tried to describe above.  But maybe you can explain some more.

Sure thing.

The code above isn't much shorter, granted.  However, in your code example, send1Mixer and send2Mixer could then be removed, and each of the AudioGainNodes could be connect directly to reverb and chorus, eliminating the need for those mixer nodes.  Additionally, if any one of the g#_# filters is extraneous (in that they will never have a gain != 1.0), they can be left out.  This has the potential to make the audio graph much, much simpler.

Also, in your "playNote()" example above, you have to create a AudioMixerNode and AudioMixerInputNode, just to add a gain effect to a simple one-shot note.  With the above change in API, those two nodes would be replaced by a single AudioGainNode.

Also, eliminating the AudioMixerNode interface removes one class from the IDL, and eliminates the single piece of API where an AudioNode is created by something other than the AudioContext, all without removing any functionality.  

Let me give some sample code which demonstrates how much shorter the client's code could be.  From:

> function playSound() {
>   var source = context.createBufferSource();
>   source.buffer = dogBarkingBuffer;
>   var reverb = context.createReverb();
>   source.connect(reverb);
>   var chorus = context.creteChorus(); 
>   source.connect(chorus);
>   var mainMixer = context.createMixer();
>   var gain1 = mainMixer.createInput();
>   reverb.connect(gain1);
>   var gain2 = mainMixer.createInput();
>   chorus.connect(gain2);
>   mainMixer.connect(context.output);
> 
>   source.noteOn(0);
> }

to:

> function playSound() {
>   var source = context.createBufferSource();
>   source.buffer = dogBarkingBuffer;
>   var reverb = context.createReverb();
>   source.connect(reverb);
>   var chorus = context.creteChorus(); 
>   source.connect(chorus);
>   reverb.connect(context.output);
>   chorus.connect(context.output);
> 
>   source.noteOn(0);
> }


Or the same code with the Constructors addition below:

> function playSound() {
>   var reverb = context.createReverb(context.output);
>   var chorus = context.creteChorus(context.output); 
>   var source = context.createBufferSource([reverb, chorus]);
>   source.buffer = dogBarkingBuffer;
> 
>   source.noteOn(0);
> }


Okay, so I kind of cheated and passed in an Array to the "createBufferSource()" constructor.  But that seems like a simple addition which could come in very handy, especially given the "fanout" nature of inputs.   Taken together, this brings a 13-line function down to 5 lines.

Of course, not all permutations will be as amenable to simplification as the function above.  But I believe that even the worst case scenario is still an improvement.

> Constructors
> 
> Except for the AudioContext.output node, every other created AudioNode needs to be connected to a downstream AudioNode input.  For this reason, it seems that the constructor functions should be changed to take an "in AudioNode destination = 0" parameter (instead of an "owner" parameter).  This would significantly reduce the amount of code needed to  write an audio graph.  In addition, anonymous AudioNodes could be created and connected without having to specify local variables:
> 
>>     compressor = context.createCompressor(context.output);
>>     mainMixer = context.createGain(compressor);
> 
> 
> or:
> 
>>     mainMixer = context.createGain(context.createCompressor(context.output));
> 
> 
> I like the idea, but it may not always be desirable to connect the AudioNode immediately upon construction.  For example, there may be cases where an AudioNode is created, then later passed to some other function where it is finally known where it needs to be connected.  I'm sure we can come up with variants on the constructors to handle the various cases.

Oh, I'm not suggesting that constructors replace the connect() function.  That's why I called this change "syntactic sugar".  The same effect could be had by returning "this" from the connect() function, allowing such constructions as:

> source2.connect(g1_2).connect(g2_2).connect(g3_2);
and:

> mainMixer = context.createGain().connect(context.createCompressor().connect(context.output));

But I think the connect-in-the-constructor alternative is less confusing.

Thanks again!

-Jer
Received on Wednesday, 16 June 2010 00:21:25 UTC