Re: Mute and MediaStream repointing from Stefan Hakansson LK on 2012-02-13 (public-webrtc@w3.org from February 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Mon, 13 Feb 2012 18:07:47 +0100
To: public-webrtc@w3.org
Message-ID: <4F394363.1030705@ericsson.com>
On 02/13/2012 03:23 PM, Randell Jesup wrote:
> First a comment on Harald's use-case, then a deeper/important discussion
> on the sources of MediaStreams and muting.
>
> On 2/13/2012 7:04 AM, Stefan Hakansson LK wrote:
>> On 02/13/2012 05:58 AM, Harald Alvestrand wrote:
>
>>> For instance, if I am a passive participant in a conference who has an
>>> open MediaStream with an outgoing voice channel, and am then given the
>>> floor to give a presentation, I might need to do:
>>>
>>> 1. AddStream(video, muted)
>>> 2. AddStream(presentation)
>>> 3. Change from low-bandwidth to high-bandwidth sound (by changing
>>> hints on my audio channel?)
>>> 4. Check that my local conferencing enginge is willing to do what I
>>> wanted (createOffer, possibly setLocal)
>>> 5. Check that my conferencing server is willing to do this (send
>>> offer, receive answer)
>>> 4. Check microphone level and whether my hair looks good
>>> 5. Hit the "OK, I have the floor" button
>>
>> This is great input, we've not been even close to this kind of detailed
>> input on how use cases are solved. (And it is a good new use case as well).
>
> It is a useful use-case (I think I discussed this with someone during
> the Interim, but I can't remember who).  I did a textual version of an
> Auditorium for PlayNet way back when that allowed participants to talk
> amongst themselves in Rooms while listening into a speaker (or panel),
> with a question queue and optional moderator.
>
>>> It's
>>> possible that there are offer/answer exchanges multiple places in this
>>> list - there might even be one when I hit "OK, I have the floor" -
>>> design here will depend strongly on whether this particular application
>>> thinks that offer/answer exchanges are done in milliseconds or minutes.
>>
>> Yes, I agree.
>>
>> Thinking a bit more about this, having the mute state being changed
>> through offer/answer can be a problem as it can take a while. (I think
>> this has nothing to do with the API to allow JSEP, but is valid in any
>> case.) Memory fades, but perhaps this was the reason while we originally
>> had the text on that a muted Video track should play out as black. The
>> receiving app can then easily detect this in a canvas element and
>> locally do the processing wanted for a nice UI. This way things are much
>> quicker (especially compared to a minutes long o/a :-) ).
>
> We've discussed it once or twice - muted tracks should generate
> *something* until/unless signaling allows them to be turned off.  We
> found in our videophone company that 'black' was a bad choice for video,
> as it confuses users ("is something broken?")  You can use silence for
> audio (especially if you have a video hold message), but some sort of
> canned audio on hold is by far the common case, so the person knows a)
> they're on hold, and b) that the connection is still alive.  Mute is
> slightly different than Hold, and much more likely would be video or
> audio alone.  Video should still give a message, but for audio silence
> is correct.
>
> Since Hold is a slightly different concept, I believe 'muting' an
> outgoing stream should generate silence for audio, and some type of
> 'muted' video message.  We used a camera icon with a circle-slash.
> Bandwidth for a static image can be dropped almost as much as you can
> for black.  I dislike (strongly) using 'black' to signal Mute because of
> codecs and the like in the path (and things like insertions of video
> 'bugs' or other manipulation after the MediaStream generation (such as
> through the MediaStream Processing API proposal) might mess it up).
>
> ** This brings up an important point about repointing a MediaStream **
>
> However, exactly what should be sent for Mute is probably an application
> issue - for video mute, the application should probably replace one
> source with a static (or not) image.  This speaks more to the
> MediaStream Processing API, since you don't want to renegotiate in order
> to change the tracks - you want to change the source of an existing track.
>
> So, basically you want a MediaStream from getUserMedia() (call it
> MSCamera), a MediaStream from a static image or pre-recorded video (call
> it MSMute), and a MediaStream that can be either (MSMerged).  We can't
> send a track, just a stream, and again we don't want to renegotiate, so
> we need to create MSMerged from either MSCamera or MSMute.  The
> MediaStream Processing API should be able to handle this, much as it can
> handle smooth cross-fades of audio and stitching video clips together -
> see demos at:
>
> http://robert.ocallahan.org/2012/01/mediastreams-processing-demos.html
>
> This could be a canned video (or audio), a fixed image, or something
> algorithmically generated by the application using Workers ("You are
> second in line" "Your wait time is 3 minutes" "Please see our new fall
> line at http://bigstore.com/"), and equivalently-generated audio.
>
> This would allow you to change the outgoing video source without
> triggering re-negotiations, which is very important.

I think this sounds like an interesting idea. It is more robust than 
sending blackness (or anything else that can be detected by analyzing 
the pixels at the receiving end), and likely to work better in GWd 
scenarios.

>
> Alternatively we'd need a way to override the source for individual
> tracks (and restore the default value), and a way to create a track as a
> discrete object, etc.
>
Received on Monday, 13 February 2012 17:08:21 UTC