Mute and MediaStream repointing from Randell Jesup on 2012-02-13 (public-webrtc@w3.org from February 2012)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Mon, 13 Feb 2012 09:23:55 -0500
To: public-webrtc@w3.org
CC: Robert O'Callahan <roc@ocallahan.org>
Message-ID: <4F391CFB.6030806@jesup.org>
First a comment on Harald's use-case, then a deeper/important discussion 
on the sources of MediaStreams and muting.

On 2/13/2012 7:04 AM, Stefan Hakansson LK wrote:
> On 02/13/2012 05:58 AM, Harald Alvestrand wrote:

>> For instance, if I am a passive participant in a conference who has an
>> open MediaStream with an outgoing voice channel, and am then given the
>> floor to give a presentation, I might need to do:
>>
>> 1. AddStream(video, muted)
>> 2. AddStream(presentation)
>> 3. Change from low-bandwidth to high-bandwidth sound (by changing
>> hints on my audio channel?)
>> 4. Check that my local conferencing enginge is willing to do what I
>> wanted (createOffer, possibly setLocal)
>> 5. Check that my conferencing server is willing to do this (send
>> offer, receive answer)
>> 4. Check microphone level and whether my hair looks good
>> 5. Hit the "OK, I have the floor" button
>
> This is great input, we've not been even close to this kind of detailed
> input on how use cases are solved. (And it is a good new use case as well).

It is a useful use-case (I think I discussed this with someone during 
the Interim, but I can't remember who).  I did a textual version of an 
Auditorium for PlayNet way back when that allowed participants to talk 
amongst themselves in Rooms while listening into a speaker (or panel), 
with a question queue and optional moderator.

>> It's
>> possible that there are offer/answer exchanges multiple places in this
>> list - there might even be one when I hit "OK, I have the floor" -
>> design here will depend strongly on whether this particular application
>> thinks that offer/answer exchanges are done in milliseconds or minutes.
>
> Yes, I agree.
>
> Thinking a bit more about this, having the mute state being changed
> through offer/answer can be a problem as it can take a while. (I think
> this has nothing to do with the API to allow JSEP, but is valid in any
> case.) Memory fades, but perhaps this was the reason while we originally
> had the text on that a muted Video track should play out as black. The
> receiving app can then easily detect this in a canvas element and
> locally do the processing wanted for a nice UI. This way things are much
> quicker (especially compared to a minutes long o/a :-) ).

We've discussed it once or twice - muted tracks should generate 
*something* until/unless signaling allows them to be turned off.  We 
found in our videophone company that 'black' was a bad choice for video, 
as it confuses users ("is something broken?")  You can use silence for 
audio (especially if you have a video hold message), but some sort of 
canned audio on hold is by far the common case, so the person knows a) 
they're on hold, and b) that the connection is still alive.  Mute is 
slightly different than Hold, and much more likely would be video or 
audio alone.  Video should still give a message, but for audio silence 
is correct.

Since Hold is a slightly different concept, I believe 'muting' an 
outgoing stream should generate silence for audio, and some type of 
'muted' video message.  We used a camera icon with a circle-slash. 
Bandwidth for a static image can be dropped almost as much as you can 
for black.  I dislike (strongly) using 'black' to signal Mute because of 
codecs and the like in the path (and things like insertions of video 
'bugs' or other manipulation after the MediaStream generation (such as 
through the MediaStream Processing API proposal) might mess it up).

** This brings up an important point about repointing a MediaStream **

However, exactly what should be sent for Mute is probably an application 
issue - for video mute, the application should probably replace one 
source with a static (or not) image.  This speaks more to the 
MediaStream Processing API, since you don't want to renegotiate in order 
to change the tracks - you want to change the source of an existing track.

So, basically you want a MediaStream from getUserMedia() (call it 
MSCamera), a MediaStream from a static image or pre-recorded video (call 
it MSMute), and a MediaStream that can be either (MSMerged).  We can't 
send a track, just a stream, and again we don't want to renegotiate, so 
we need to create MSMerged from either MSCamera or MSMute.  The 
MediaStream Processing API should be able to handle this, much as it can 
handle smooth cross-fades of audio and stitching video clips together - 
see demos at:

http://robert.ocallahan.org/2012/01/mediastreams-processing-demos.html

This could be a canned video (or audio), a fixed image, or something 
algorithmically generated by the application using Workers ("You are 
second in line" "Your wait time is 3 minutes" "Please see our new fall 
line at http://bigstore.com/"), and equivalently-generated audio.

This would allow you to change the outgoing video source without 
triggering re-negotiations, which is very important.

Alternatively we'd need a way to override the source for individual 
tracks (and restore the default value), and a way to create a track as a 
discrete object, etc.

-- 
Randell Jesup
randell-ietf@jesup.org
Received on Monday, 13 February 2012 14:25:25 UTC