Re: Adding Web Audio API Spec to W3C Repository from Joseph Berkovitz on 2011-06-10 (public-audio@w3.org from April to June 2011)

From: Joseph Berkovitz <joe@noteflight.com>
Date: Fri, 10 Jun 2011 16:42:08 -0400
To: Philip Jägenstedt <philipj@opera.com>, Doug Schepers <schepers@w3.org>, Robert O'Callahan <robert@ocallahan.org>, public-audio@w3.org
Message-Id: <37487B12-6761-4C23-AAA3-DB6680565931@noteflight.com>
Hi folks,

I have been standing back from this dialogue for a while, waiting for a clearer picture of the alternatives to emerge. Because the Mozilla proposal is still a little short on detail I think we are still in a waiting place, but there are some design choices starting to come into view.

Taking the MediaStreamsAPI proposal as indicative of Mozilla's direction (and, conversely, setting aside Mozilla's current API which seems very different in flavor from Rob's proposal), both choices would seem to support a graph-based, declarative system, would supply some library of ready-made nodes or processors, and would allow injection of custom procedural code as needed by specific nodes. So these may not be the biggest differentiators any more.

The largest-scale issue I see so far is this: are nodes in the audio processing graph identified with Streams or not? And if they are, what are the advantages and disadvantages that come with such an identification?  I am not yet familiar enough with the HTML Streams and RTC world to float such a list of pros and cons, and I think it might be helpful if Rob and Chris could provide their respective views on this, along with the other experts contacted by Doug.

However, I want to point out that it does not seem strictly necessary to identify streams and audio-processing nodes in order to relate these two realms.  If an AudioNode could be created to represent the audio facet of a Stream, and a Stream could be created to represent the output of an AudioNode graph, would we really need every node to *be* a Stream?  After looking at the RTCStream API I find that I have an initial bias (which could change easily). The starting bias is this: streams seem to be good abstractions for setting up session-level graphs of cameras, microphones and remote participants where data needs to be routed and mixed, but not so much processed, triggered or synthesized. Streams may also be on the heavy side for slinging local buffers of audio sample frames around with some processing. So I wonder if it is better to take an entire audio processing graph and treat this as a single object participating as a Stream (source and/or sink) in an RTC session; inside that graph, there may be implementation economies that are unique to audio nodes which are not as easy to achieve when every node can also be an arbitrarily time-coded, interruptible Stream. 

Apart from that I see a mixture of different takes on various issues -- for example, the way in which sounds can be scheduled (noteOn in Chris's proposal; delayed stream mixing in Rob's). These can also be examined on their merits, orthogonally to the Stream question.  It seems to me that most of these would work both with or without nodes being streams.

By the way, do streams help with MIDI or with controller-type data? I don't see that they do, yet. Neither proposal seems to do a good job in this area yet.

Finally, as others have already observed, I think the topic of abstraction libraries is a bit off-topic in the context of a group whose job is to create strong cross-browser standards. Yes, wrapper libraries will always exist, but that does not mean we have less of a job to do, or that the job is less important. I think we need to bear down on identifying the differences between the realistic proposals and trying to take the best from each.

Best,

... .  .    .       Joe

Joe Berkovitz
President
Noteflight LLC
84 Hamilton St, Cambridge, MA 02139
phone: +1 978 314 6271
www.noteflight.com

On Jun 10, 2011, at 3:49 AM, Philip Jägenstedt wrote:

> On Thu, 09 Jun 2011 23:46:33 +0200, Robert O'Callahan <robert@ocallahan.org> wrote:
> 
>> We (Mozilla) definitely plan to put forward a new spec that builds on the
>> HTML Streams proposal. I would like to make more progress on the
>> implementation before we do that, but if you think otherwise, we can go
>> forward.
>> 
>> I believe the concerns I raised about synchronization and the relationship
>> with the Streams proposal that I raised in the earlier thread are still
>> valid, but that thread was a tennis match between me and Chris and I'd like
>> to hear from W3C people and other parties (especially HTML and Streams
>> people) how they feel about those concerns.
>> 
>> As a veteran of decade-long efforts to resolve conflicts between specs that
>> never should have happened in the first place (SVG, CSS and HTML, I'm
>> looking at you), I think it's worth taking time to make sure we don't have
>> another Conway's law failure. The immediate demand for audio API will have
>> to be (and is being) satisfied by libraries that abstract over browser
>> differences, and that will remain true for quite some time no matter what
>> the WG does.
> 
> Hi roc, everyone,
> 
> I've been regrettably silent on this list so far, but an Audio API is something that we (Opera) certainly want to support, so getting it right is of course important to us.
> 
> It was quite some time since I looked at <http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html>, so I hope this is still the correct version to look at. My gut reaction to this spec has always been "too big" and "you ain't gonna need it". However, it's hard to argue with something that's being implemented without having a better proposal, so I've stayed silent.
> 
> About https://wiki.mozilla.org/MediaStreamAPI then...
> 
> This looks quite promising to me, and I'm looking forward to seeing a proof-of-concept implementation. Since implementing HTML5 multitrack, video conferencing and adaptive streaming will require quite a bit of internal media framework plumbing, it would be good if the Audio API maps more directly to the concepts used there, which is the Stream object. So, roc's proposal certainly feels more native to and integrated with the rest of the HTML media stack, which was of course the main objective, so that's no surprise.
> 
> As for declarative vs scripted, I guess what this comes down to is latency. Which kinds of effects are strictly necessary to have dedicated filters for, and which are not? Not knowing the answers, starting out simple and adding filters only when needed seems a good approach to me.
> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software
>
Received on Friday, 10 June 2011 20:42:49 UTC