W3C home > Mailing lists > Public > public-xg-audio@w3.org > July 2010

Re: Web Audio API Proposal

From: Chris Rogers <crogers@google.com>
Date: Mon, 19 Jul 2010 13:39:38 -0700
Message-ID: <AANLkTiktoVMeVnp2-5oe6oU4PEAHSOS5QJ+tifFEiydB@mail.gmail.com>
To: Yury Delendik <async.processingjs@yahoo.com>
Cc: public-xg-audio@w3.org
Hi Yury,

Thanks for the questions - I appreciate your input.

On Fri, Jul 16, 2010 at 8:52 PM, Yury Delendik <async.processingjs@yahoo.com
> wrote:

> Hello Chris,
> I'm trying to read and analyze the current proposal for the Web Audio API
> at the
> moment. The ideas expressed in the specification are straightforward and
> simple.
> The directional graph presentation of the audio processing nodes makes
> simple to
> visualize the signal flow.
> My feedback/questions:
> 1) It took some time to gather all missing pieces of information from: the
> examples, the SVN change log, and the public-xg-audio list. I had the
> trouble to
> understand why the examples have AudioMixerNode and there is no such node
> in the
> specification – this node type was in the previous versions. To make the
> learning experience better, can the change log section be included in the
> body
> of the proposal/specification?

Sorry about the confusion with AudioMixerNode.  Jer Noble suggested that we
switch to use AudioGainNode instead.  The proposal/specification document
was changed very quickly.  Later, I implemented AudioGainNode (while still
leaving the old API working, but deprecated).  Finally, just a few days ago,
I changed all of the javascript sample code to use AudioGainNode instead of

My goal is to keep the demos/samples working at all times (with an
up-to-date build of the WebKit audio branch).  When changes happen here is
the order:

1)  When the API changes due to discussions on this list, I'll update the
specification as soon as possible.
2)  Sometime later, I'll manage to implement the change while striving to
keep the old implementation working (as a deprecated API).
3)  Still later, I'll change the javascript in the samples to match the new

If possible, I'll try to execute step (3) immediately after (2), but
sometimes there will be a time lag.  I hope you'll appreciate the complexity
I'm dealing with :)

I'll try to put a change log section in the document as you suggest to keep
track of these changes a little better.

> 2) Since the primary subject of the specification is AudioNode based
> classes, it
> will be beneficial to see possible values and details of the its primary
> attributes: numberOfInputs and numberOfOutputs, e.g.
>    AudioBufferSourceNode
>    ==================
>    numberOfInputs = 0
>    numberOfOutputs = 1
>        Output #0 - Audio with same amount of channels and sampleRate as
> specified in the AudioBuffer object

Good point.  I'll try to add more detail in places such as this.  I'll make
a pass through the document today. Anytime you find some details which are
missing, please let me know.

> 3) It looks like the RealtimeAnalyzerNode has special status: it does not
> output
> any audio data. What it really outputs: passes the data without change,
> only
> changes the signal gain (somebody recommended to add “gain” attribute to
> the
> AudioNode), or has no outputs? Can the RealtimeAnalyzerNode be used without
> connecting it to the destination node?

This is a good question, and one which Ricard Marxer was also asking about.
 I was considering that analyser nodes would operate in a "pass-though"
mode.  In other words, one input and one output, with the input being passed
unchanged to the output.  I was anticipating that these nodes could be
inserted anywhere in the signal chain to analyse, but would not otherwise
interfere with the signal flow.  But, I can see why this might be confusing
and changing the analyser node to *only* have an input with no output might
make more sense.  In my original design the AudioDestinationNode was the
only node which could be a "terminal" node in the graph, with everything
being "pulled" by this node.  But we could also allow analyser nodes to be
"terminal" nodes (no outputs).  I don't think there should be any technical
issues preventing this in the implementation.

I would be interested in hearing people's preference between the two

If an analyser has no output, then in the JavaScript processing case we
would now be faced with three types of nodes:

1) JavaScriptSourceNode           0 inputs : 1 output
2) JavaScriptProcessorNode      N inputs : M output (could be 1 input : 1
output if we wish to keep it simple)
3) JavaScriptAnalyserNode        1 input : 0 outputs

Although, maybe we can just have one JavaScriptAudioNode, and through
different configuration (different constructor arguments?) it would end up
being one of the above three.  What do people think?

> 4) According section 16, it looks like the only object that can be used
> without
> context is AudioElementSourceNode that can be retrieved via audioSource
> property. Is it correct?

Yes, it is the only object which can be retrieved without the context.  But,
it must be connected to other nodes which belong to a specific context.  In
the simplest case, it would be connected to "context.destination"

> 5) If the audio element will is playing the streaming data, will the sound
> also
> be “played” in the connected audio context?

This is a question Ricard Marxer and I have been discussing.  My inclination
is to view the act of connecting the audioSource from the audio element as
implicitly disconnecting it from its "default" destination.  So, it would
never be audible both in the normal default way, and also audible from its
processing in an explicitly constructed graph.  Ricard has suggested that we
would require a disconnect() call to make it inaudible in the normal/default
playback path, but I'm not sure there would be any cases where it would
desirable to *not* disconnect(), and simply forgetting to call it would
sound very confusing and not be the desired result.

> 6) How many AudioContext instances are possible to run/instantiate on the
> single
> web page?

Ricard and I have also been discussing this.  I think in the vast majority
of cases a single AudioContext would be sufficient.  Ricard brought up a
case where there are two separate physical audio devices connected to the
computer (one AudioContext for each one).  But most sophisticated desktop
audio software does not even support this scenario (especially when the
devices are from different manufacturers or running at different
sample-rates) .  Maybe we can leave it an open question for now, or suggest
that a more advanced implementation might later support multiple
AudioContexts, but a simpler one would only allow one.

> 7) The JavaScript was chosen as a client-side scripting language to control
> the
> objects that are implemented on the  high performance languages (typically
> C/C++). One of the specifics of the JavaScript objects is to contain some
> members that help to discover some meta data. In noticed that AudioBuffer
> interface contains “length” attribute that bring different meaning to the
> JavaScript “length” property (that usually has specifies the amount of the
> members in the object). It's recommended to select names of the methods
> that
> will not conflict or change the meaning of the standard identifiers of the
> target scripting language.

I think Eric Carlson suggested "length", other names I could suggest are:


Anybody have any other ideas for names?

For those who are confused by the use of the term "sample-frame", one
sample-frame (or one frame) represents one sample per channel.  So if there
are N channels, then the number of samples per sample-frame is N.  In other
words, the sample-frame is the grouping of all samples across the N

> 8) Some of the class definitions are missing from the specification and
> really
> help to understand it: AudioSourceNode, AudioListenerNode, AudioSource,
> AudioBufferSource, AudioCallbackSource, etc.

Thanks.  I'll try to fill these in.

> 9) The Modular Routing section states that “the developer doesn't have to
> worry
> about low-level stream format details when two objects are connected
> together;
> the right thing just happens. For example, if a mono audio stream is
> connected
> to a stereo input it should just mix to left and right channels
> appropriately.”
> There are lots of ways/algorithms how to change the amount of the channels,
> the
> sample rate, etc. I think the web developer shall know what they will
> receive as
> a result: only the left channel or mix of all channels from the 5.1 source
> stream. Could you document how “the right thing” will happen?

I'll add a section about this.  It basically boils down to different ways of
up-mixing and down-mixing different channel layouts (mono -> stereo, etc.)

> 10) How is the sampleRate attribute value defined/chosen for the non-source
> nodes, e.g. AudioDestinationNode or AudioGainNode? In case when multiple
> outputs
> mixed in one input?

As I was discussing with Ricard, I would suggest that the sample-rate is
constant for all nodes in a given AudioContext, and the sample-rate
is an attribute on AudioContext.  The document hasn't yet been updated to
this, but I will if nobody objects.  So, in this case there's no problem
because we're mixing things all at the same sample-rate.

> Thank you,
> Yury Delendik

Thanks Yury,
Received on Monday, 19 July 2010 20:40:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:37:58 UTC