Re: Web Audio API Proposal

On Jun 14, 2010, at 5:06 PM, Chris Rogers wrote:

> I've been working with some folks at Apple (Maciej Stachowiak, Eric Carlson, Chris Marrin, Sam Weinig, Simon Fraser) to refine the javascript API to the point where I think it makes sense to open up discussions on this group.  Here's a preliminary specification for the API:
> 
> http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html
> 

Hi Chris,

I'm in the midst of reviewing your spec, and I have a few comments and suggestions:

Ownership

Building the concept of lifetime-management into the API for AudioNodes seems unnecessary.  Each language has its own lifetime-management concepts, and in at least one case (JavaScript) the language lifetime-management will directly contradict the "ownership" model defined here.  For example, in JavaScript the direction of the "owner" reference implies that the AudioNode owns its "owner", not vice versa.

Additionally, it seems that it's currently impossible to change the "owner" of an AudioNode after that node has been created. Was the "owner" attribute left out of the AudioNode API purposefully?

Multiple Outputs

While there is an explicit AudioMixerNode, there's no equivalent AudioSplitterNode, and thus no explicit way to mux the output of one node to multiple inputs.  

In the sample code attached to Section 17, a single source (e.g. source1) is connected to multiple inputs, merely by calling "connect()" multiple times with different input nodes.  This doesn't match the AudioNode API, where the "connect()" function takes three parameters, the input node, the output index, and  the input index.  However, I find the sample code to be a much more natural and easier to use API.  Perhaps you may want to consider adopting this model for multiple inputs and outputs everywhere.

Let me throw out some ideas for how the API would then look:

> interface AudioNode

> {

>         void connect(in AudioNode destination);

>         void disconnect(in AudioNode destination);

>         readonly attribute sequence<AudioNode> connections;

>         readonly attribute float sampleRate;

> }



Multiple outputs would be generated dynamically, per connection.

This would then match the sample code listed in Section 17, repeated here:

>     g1_1 = mainMixer.createInput(source1);
>     g2_1 = send1Mixer.createInput(source1);
>     g3_1 = send2Mixer.createInput(source1);
>     source1.connect(g1_1);
>     source1.connect(g2_1);
>     source1.connect(g3_1);

Each AudioNode would thus have the ability to mux or split its outputs, removing the necessity of an explicit AudioSplitter class.

Multiple Inputs

This same design could be applied to multiple inputs, as in the case with the mixers.  Instead of manually creating inputs, they could also be created dynamically, per-connection.

There is an explicit class, AudioMixerNode, which creates AudioMixerInputNodes,  demuxes their outputs together, and adjusts the final output gain.  It's somewhat strange that the AudioMixerNode can create AudioMixerInputNodes; that seems to be the responsibility of the AudioContext.  And it seems that this section could be greatly simplified by dynamically creating inputs. 

Let me throw out another idea.  AudioMixerNode and AudioMixerInputNode would be replaced by an AudioGainNode.  Every AudioNode would be capable of becoming an audio mixer by virtue of dynamically-created demuxing inputs.  The API would build upon the revised AudioNode above:

> interface AudioGainNode : AudioNode

> {

>         AudioGain gain;

>         void addGainContribution(in AudioGain);

> }


The sample code in Section 17 would then go from:

>     mainMixer = context.createMixer();
>     send1Mixer = context.createMixer();
>     send2Mixer = context.createMixer();

>     g1_1 = mainMixer.createInput(source1);
>     g2_1 = send1Mixer.createInput(source1);
>     g3_1 = send2Mixer.createInput(source1);
>     source1.connect(g1_1);
>     source1.connect(g2_1);
>     source1.connect(g3_1);

to:

>     mainMixer = context.createGain();
>     send1Mixer = context.createGain();
>     send2Mixer = context.createGain();

>     source2.connect(mainMixer);
>     source2.connect(send1Mixer);
>     source2.connect(send2Mixer);

Per-input gain could be achieved by adding an inline AudioGainNode between a source output and its demuxing input node:

>     var g1_1 = context.createGain(); 

>     source2.connect(g1_1);

>     g1_1.connect(mainMixer);
>     g1_1.gain.value = 0.5;

If the default constructor for AudioNodes is changed from "in AudioNode owner" to "in AudioNode input", then a lot of these examples can be cleaned up and shortened.  That's just syntactic sugar, however. :)

Constructors

Except for the AudioContext.output node, every other created AudioNode needs to be connected to a downstream AudioNode input.  For this reason, it seems that the constructor functions should be changed to take an "in AudioNode destination = 0" parameter (instead of an "owner" parameter).  This would significantly reduce the amount of code needed to  write an audio graph.  In addition, anonymous AudioNodes could be created and connected without having to specify local variables:

>     compressor = context.createCompressor(context.output);
>     mainMixer = context.createGain(compressor);


or:

>     mainMixer = context.createGain(context.createCompressor(context.output));




I'm including the sample code from section 17, rewritten with these new API ideas, so that you can see the effect they may have on a a client's implementation.



Thanks!

-Jer

Received on Tuesday, 15 June 2010 21:37:08 UTC