Re: TPAC F2F and Spec Proposals (was: Attendance for the AudioWG F2F meeting on Monday, 31 October) from Joseph Berkovitz on 2011-10-17 (public-audio@w3.org from October to December 2011)

From: Joseph Berkovitz <joe@noteflight.com>
Date: Mon, 17 Oct 2011 10:52:16 -0400
To: Robert O'Callahan <robert@ocallahan.org>, Alistair MacDonald <al@signedon.com>, Doug Schepers <schepers@w3.org>, tmichel@w3.org, Philippe Le Hegaret <plh@w3.org>, public-audio@w3.org, mgregan@mozilla.com
Message-Id: <A962253F-8057-4DAB-AFAE-F4BEC21C102E@noteflight.com>

Thanks to everyone for cueing up what promises to be a very useful discussion, and one that's essential to our further progress.

Clearly it is possible to identify the abstractions of "node" and "stream" with a high degree of overlap (as well as the responsibilities of Audio and AudioSample) and there are clear positives in having fewer and simpler notions. I think what is needed is a close examination of the following questions:

1. Are MediaStreams capable of supporting, as Chris put it, "large numbers of short, overlapping sounds which have extremely stringent requirements in terms of timing, latency, mixing, and performance."  This is not a question about whether the abstractions can be identified; it seems likely to me that they can be, in an API-completeness sense. But this is a question of whether *a concrete implementation* will have a tough time supporting the above requirement, given its need to also support the stream-specific responsibilities of MediaStreams. 

I'd love to have a completely built-out code sample but I think I can make the point with something smaller.  A typical music soft-synth might call the following sort of code block (adapted from ROC's example #11) in quick succession many times on the same effectsMixer object, resulting in, say, 100s of potentially overlapping inputs waiting for their turn to be scheduled:

  function triggerSound(audio, offset) {
    var stream = audio.captureStream();
    audio.play();
    var port = effectsMixer.addInput(stream, offset);
    stream.onended = function() { port.remove(); }
  }

Furthermore it is a requirement that the same Audio object can be played simultaneously through the mixer with different playbackRates, amplitudes and mixdown parameters -- this is how a typical instrumental wavetable synth works. Will the approach of piping Audio objects through a mixer stream play nice with that requirement? Does captureStream() always return the same object for a given Audio being captured? If so, that might be a problem.

2. What concrete features of the Web Audio API, if any, support significant use cases that the MediaStream proposal does not?  (This is a question about broad concepts, not about, say, whether some particular effect is available or not.)  I'll throw out a couple of points that I think might qualify:
	- AudioParams provide for smooth automation of arbitrary time-varying parameters, rather than having a new value be set in a step function when a stream reaches a stable state. The ability to supply linear or exponential ramps for such parameters is an essential facet of any soft-synth.
        - AudioBuffers allow the same sample data to be shared between more than one AudioBufferPlaybackNode.
	- the noteOff() function allows the end of a stream to be pegged to a time offset, not just the start of it
	- there is a notion of global time across the AudioNode graph. In the MediaStream case, currentTime gives the time since a specific stream was created, which is not as useful (I suspect there's a way to address this need that I'm just not seeing).

3. Insofar as we agree on some set of aspects not found in the MediaStream proposal, can they be added to it? Do they make sense when thinking in terms of MediaStream abstraction? In other words, do they conceptually break the identification of the two abstractions?

I'm looking forward to the discussion!

Thanks,

... .  .    .       Joe

Joe Berkovitz
President
Noteflight LLC
84 Hamilton St, Cambridge, MA 02139
phone: +1 978 314 6271
www.noteflight.com

On Oct 17, 2011, at 2:00 AM, Robert O'Callahan wrote:

> On Sun, Oct 16, 2011 at 6:15 PM, Alistair MacDonald <al@signedon.com> wrote:
> 
>  It should integrate seamlessly with other MediaStream producers and consumers, without bridging.
> 
> Could you add some detail to this explaining with/without bridging and why it is important?
> 
> For example, if you look at example 5 here:
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/webrtc-integration.html
>    navigator.getUserMedia('audio', gotAudio);
>    function gotAudio(stream) {
>      var microphone = context.createMediaStreamSource(stream);
>      var backgroundMusic = context.createMediaElementSource(document.getElementById("back"));
>      var analyser = context.createAnalyser();
>      var mixedOutput = context.createMediaStreamDestination();
>      microphone.connect(analyser);
>      analyser.connect(mixedOutput);
>      backgroundMusic.connect(mixedOutput);
> 
> The calls to "createMediaStreamSource" and "createMediaStreamDestination" map MediaStream objects to AudioNode objects and vice versa. They are only needed because AudioNodes and MediaStreams are separate worlds that need to be explicitly bridged. That is unnecessary complication for authors, compared to just supporting audio processing directly on MediaStreams.
> 
> Rob
> -- 
> "If we claim to be without sin, we deceive ourselves and the truth is not in us. If we confess our sins, he is faithful and just and will forgive us our sins and purify us from all unrighteousness. If we claim we have not sinned, we make him out to be a liar and his word is not in us." [1 John 1:8-10]

Received on Monday, 17 October 2011 14:52:49 UTC