Re: MediaStream integration from Chris Rogers on 2012-05-10 (public-audio@w3.org from April to June 2012)

From: Chris Rogers <crogers@google.com>
Date: Wed, 9 May 2012 18:04:36 -0700
To: robert@ocallahan.org
Cc: public-audio@w3.org
Message-ID: <CA+EzO0nm0FOj6b3taAyj7YcLgf5QspqPLhm1SNTehEM1m_CPVw@mail.gmail.com>
On Wed, May 9, 2012 at 4:59 PM, Robert O'Callahan <robert@ocallahan.org>wrote:

> On Thu, May 10, 2012 at 11:51 AM, Chris Rogers <crogers@google.com> wrote:
>
>> Hi Robert, sorry if there was any confusion about this.  I haven't
>> written up any explanation for this API yet, but hope to add it to the main
>> Web Audio API editor's draft soon:
>> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html
>>
>> The intention is that context.createMediaStreamDestination() will create
>> a destination node that a sub-graph can connect to.  So it's not capturing
>> the output of the entire context (context.destination).  In theory, it
>> should be possible to call context.createMediaStreamDestination() multiple
>> times, each of which sends out to a different remote peer with different
>> processing.
>>
>
> OK, I see now. That sounds better. But why not just skip
> createMediaStreamDestination and provide 'stream' on every AudioNode?
>

That might be possible.  In this case, I suppose that the .stream would be
a MediaStream for the rendered output for that AudioNode.  But AudioNodes
can have more than a single output, for example AudioChannelSplitter has
multiple outputs:
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioChannelSplitter

Also, we've recently discussed having multiple inputs and outputs on
JavaScriptAudioNode:
http://lists.w3.org/Archives/Public/public-audio/2012AprJun/0095.html

And in the future other AudioNodes may be created with multiple outputs
(such as a cross-over filter with three outputs, "low", "mid", and "high").

So for the multiple output case, your .stream attribute could be a
MediaStream with multiple MediaStreamTracks (one per output).  But I think
it would be more flexible for the API to allow finer-grained control,
tapping into individual outputs which is possible if there is an explicit
destination node representing a MediaStream.  Then a connection from a
specific output to a MediaStream destination is explicit and is more in
line with the rest of the Web Audio API.

I talk about the design behind nodes here, where there are AudioNodes which
can be a source, processor, or destination nodes:
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioNode-section

And also in this section where there is a diagram showing the connection
between boxes (AudioNodes):
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#ModularRouting

We have explicit AudioSourceNode and AudioDestinationNode types.  Currently
the text of the spec says that there is only a
*single* AudioDestinationNode per-context representing the final output.
 But, I think we will need to change this so there are
multiple AudioDestinationNodes (one for the speakers, one sending to a
remote peer, and so on)

I think it's important to maintain the ideas of "nodes" and "connection" so
that in the simplest case the diagram is a "source" node and a
"destination" node with a single connection shown as two boxes:
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#ModularRouting

In the case of a stream sending to a remote peer, then it seems to make
sense to have a specific node representing that destination.


>
>>
>>
>>> I think authors will expect createMediaStreamSource(stream).stream (or
>>> the equivalent trivial graph using context.createMediaStreamDestination())
>>> to be semantically a no-op (except for the output having only one audio
>>> track). I think we should make sure it is.
>>>
>>
>> I think so, if I understand you correctly.
>>
>
> Let me clarify what you'd be agreeing to :-). It means, for example, that
> if MediaStreams get the ability to pause without dropping samples,
> AudioNodes must be able to as well.
>

That's not quite how I think about it.  Currently the HTMLMediaElement and
the MediaElementAudioSourceNode are distinct types.  One represents a
high-level "media player" API, with networking and buffering state,
seeking, etc.  The other is an AudioNode, implementing the semantics of
being a node (being able to connect() with other nodes, and having specific
numberOfInputs, numberOfOutputs).  I believe that it's a good principle of
software design to separate these two concepts - think "loose coupling and
high cohesion"

I think we need to keep this distinction in mind with MediaStream as well.



>
> Rob
> --
> “You have heard that it was said, ‘Love your neighbor and hate your
> enemy.’ But I tell you, love your enemies and pray for those who persecute
> you, that you may be children of your Father in heaven. ... If you love
> those who love you, what reward will you get? Are not even the tax
> collectors doing that? And if you greet only your own people, what are you
> doing more than others?" [Matthew 5:43-47]
>
>
Received on Thursday, 10 May 2012 01:05:07 UTC