Re: Rationalizing new/start/end/mute/unmute/enabled/disabled from Randell Jesup on 2013-04-07 (public-media-capture@w3.org from April 2013)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Sat, 06 Apr 2013 23:45:32 -0400
To: public-media-capture@w3.org
Message-ID: <5160EBDC.7070404@jesup.org>
We've been around this drain before... see 
http://www.ietf.org/mail-archive/web/rtcweb/current/msg04624.html and 
followups.  Perhaps we can make it to the exit this time.... :-)

On 4/4/2013 7:01 AM, Stefan Håkansson LK wrote:
> On 4/3/13 5:29 PM, Randell Jesup wrote:
>> On 3/25/2013 5:55 PM, Martin Thomson wrote:
>>> I think that it's fair to say that the current state of the
>>> MediaStream[Track] states is pretty dire.  At least from a usability
>>> perspective.
>>
>> I generally agree with the approach here.  I also agree that the
>> MediaStream should be an explicit rollup of the states of the tracks
>> (just as I feel we need a stream.stop() in addition to track.stop(),
>> even though you can build one in JS yourself).
>>
>> One thing I really want to see described (doesn't have to be in the
>> spec) is how an application can provide a "Hold" operation where live
>> video is replaced with a video slate and/or pre-recorded
>> animation/audio, and do it without an offer/answer exchange. The
>> MediaStream Processing API would have made this fairly simple, but since
>> we don't have that, we need to define something.  WebAudio (if/when
>> implemented) may help (with some pain) for the audio side, but doesn't
>> help us with video.
>>
>> The real blocker here is giving a way for a MediaStreamTrack (in a
>> MediaStream that's already AddStream()ed to a PeerConnection) to get a
>> different source.  Currently, the only easy way I can see it is very
>> kludgy and probably higher overhead/delay than we'd like:
>>
>>         video_hold.src = my_hold_animation.ogg;
>>         elevator_music = video_hold.captureStreamUntilEnded();
>
> What is this? An undocumented feature of the media element?

It's a (very useful) API originally part of the Media Processing API 
(you can look at the last draft of that).  Takes the output (decoded 
audio and video) of a media element and uses it to source a 
MediaStream.  We absolutely need it if we want any way to feed an 
encoded/saved stream into a PeerConnection.  (We can record messages, 
but we can't play them back except maybe through a canvas (ugh)).  We 
can't even have a "Sorry, I'm not here right now, please leave a 
message" without something like this.

Se we need it (or the equivalent) for all sorts of reasons, not just 
"Hold"/Mute/etc.

Firefox has had this since our MediaStream code landed most of a year ago.

>
>> The only alternative I see in the current spec might be to have two
>> tracks always, and disable the live track and enable the Hold/slate
>> track - but that would still cause a negotiationneeded I believe before
>> it took affect.
>
> I will not argue that being able to switch source for a 
> MediaStreamTrack is useless, because I think it could be useful. But 
> switching source could very well also lead to that a renegotiation is 
> needed (I take this from what Cullen said in Boston: if the current 
> source encodes with codec A but the other with codec B you'd have to 
> renegotiate anyway).

True, but not relevant.  If you need to negotiate, you do so.  And I'll 
note that MediaStreams holding already-encoded data is I believe under 
(or not) specified, other than people waving their hands and saying 
"we'd like to have a camera that encodes and not have to decode and 
re-encode it".  It explicitly doesn't define a canonical representation 
- but it also doesn't specify anything related to that or how things 
hooked up to MediaStreams can deal with incoming data, which causes 
confusion in the current question.

The closest it comes to defining this behavior is (in defining 
"source"): "A source can be a physical webcam, microphone, local video 
or audio file from the user's hard drive, network resource, or static 
image."  Also "When a ||MediaStream| 
<http://dev.w3.org/2011/webrtc/editor/getusermedia.html#idl-def-MediaStream>| 
object is being generated from a local file (as opposed to a live 
audio/video source), the user agent /SHOULD/ stream the data from the 
file in real time, not all at once. ", and "User agents /MAY/ allow 
users to use any media source, including pre-recorded media files" and 
in discussion of implementation: "in addition to the ability to 
substitute a video or audio source with local files and other media. A 
file picker may be used to provide this functionality to the user."

That's about all that's said about sources other than cameras.  We 
really need to flesh this out - and decide what actually should be 
required, or decide what is optional. 
video_element.captureStreamUntilEnded() has an advantage of making it 
possible for anything that can be a video source into a source for a 
MediaStream (Media Source API, streaming video, etc).

>
> If I understand the use-case you describe here correctly, you'd like 
> to switch to Muzak being played out at the remote end when you mute 
> locally.
>
> The straghtforward way of meeting this use-case with the current set 
> of APIs would be:
>
> 1. As the application loads, download the Muzak source file
>
> 2. Use one video element to render the live a/v stream from the remote 
> party, and a separate (muted) audio element that has Muzak as source - 
> set up in a "loop" fashion
>
> 3. When the user at the remote end presses "mute me" in the 
> application, have the app a) disable the audio track and b)send a 
> "play Muzak" signal to the peer
>
> 4. When "change to Muzak" is received, unmute the Muzak audio element 
> (no need to mute the video element as silence is being played)
>
> 5. Same goes for unmute -> signal "stop play Muzak" -> mute the Muzak 
> audio element.
>
> There are many other options as well.

Ok, an application could do that, but that moves the Muzak from the 
source to the target - that doesn't work if the target is a different 
app, or the target is behind a gateway, or if what you want to play is 
local to the sender.  I'll note I've done exactly this in the past to 
indicate Video Mute (show a local slate to say the other side Muted - 
that way users don't go "it's black - something's broken with the 
video!" (and they do)).  However, I had to move to a source-side mute 
once we had to deal with any type of non-homogenous network.

And some apps will do it this way.  But there are plenty of reasons to 
believe people will want to change the source of a MediaStream (or 
MediaStreamTrack(s)).  Another random/silly-but-real example: if you 
want to do Reindeer Antlers on someone's video image, you'll need to 
change from direct getUserMedia()->PeerConnection to add a canvas 
inbetween (or some such), and that means re-routing the data on the fly, 
unless you set up the entire pipeline from the start "just incase" it 
was needed (and, BTW, doing so would blow any attempt to keep the data 
encoded - see above) - or you'd have to re-negotiate on add and remove - 
and this gets more painful, since you'd need to drop an old stream, and 
add a new one on each transition (building up m-lines) or disable the 
old one and add/enable the new one maybe.

Another example: I want to (in my app) to be able to smoothly and 
quickly switch between front and back cameras.  I don't want to have a 
offer/answer exchange to do this.  Or switch mics.

>
>>
>> p.s. For Hold and Mute in a video context, I rarely if ever want "black"
>> as an application.  I may want to send information about Hold/Mute
>> states to the receiver so the receiver can do something smart with the
>> UI, but at a minimum I want to be able to provide a
>> slate/audio/music/Muzak-version-of-Devo's-Whip-It (yes, I heard that in
>> a Jack In The Box...)
>
> There is also the "poster" possibility with the video element (i.e. an 
> image to be played in absence of video).

Right; that's what I refer to as a 'slate'.  Sorry, video/cable business 
dialect.  :-)

-- 
Randell Jesup
randell-ietf@jesup.org
Received on Sunday, 7 April 2013 03:47:52 UTC