Re: Adding Web Audio API Spec to W3C Repository

On Fri, Jun 10, 2011 at 1:42 PM, Joseph Berkovitz <>wrote:

> Hi folks,
> I have been standing back from this dialogue for a while, waiting for a
> clearer picture of the alternatives to emerge. Because the Mozilla proposal
> is still a little short on detail I think we are still in a waiting place,
> but there are some design choices starting to come into view.
> Taking the MediaStreamsAPI proposal as indicative of Mozilla's direction
> (and, conversely, setting aside Mozilla's current API which seems very
> different in flavor from Rob's proposal), both choices would seem to support
> a graph-based, declarative system, would supply some library of ready-made
> nodes or processors, and would allow injection of custom procedural code as
> needed by specific nodes. So these may not be the biggest differentiators
> any more.
> The largest-scale issue I see so far is this: are nodes in the audio
> processing graph identified with Streams or not? And if they are, what are
> the advantages and disadvantages that come with such an identification?  I
> am not yet familiar enough with the HTML Streams and RTC world to float such
> a list of pros and cons, and I think it might be helpful if Rob and Chris
> could provide their respective views on this, along with the other experts
> contacted by Doug.
> However, I want to point out that it does not seem strictly necessary to
> identify streams and audio-processing nodes in order to relate these two
> realms.  If an AudioNode could be created to represent the audio facet of a
> Stream, and a Stream could be created to represent the output of an
> AudioNode graph, would we really need every node to *be* a Stream?  After
> looking at the RTCStream API I find that I have an initial bias (which could
> change easily). The starting bias is this: streams seem to be good
> abstractions for setting up session-level graphs of cameras, microphones and
> remote participants where data needs to be routed and mixed, but not so much
> processed, triggered or synthesized. Streams may also be on the heavy side
> for slinging local buffers of audio sample frames around with some
> processing. So I wonder if it is better to take an entire audio processing
> graph and treat this as a single object participating as a Stream (source
> and/or sink) in an RTC session; inside that graph, there may be
> implementation economies that are unique to audio nodes which are not as
> easy to achieve when every node can also be an arbitrarily time-coded,
> interruptible Stream.

Hi Joe, this is exactly the idea that Ian Hickson and I have in
mind, keeping the two APIs separate, but designed to work together as you
describe.  I think there are not only implementation economies, but also
huge conceptual economies for the developer.  In my view, lumping together
the two APIs would be more difficult to understand and would jumble together
concepts which each have their own nuance in terms of API presentation and
implementation.  Well-factored designs should separate out conceptually
different pieces.


Received on Friday, 10 June 2011 21:06:35 UTC