Re: [rtcweb] Mute implementations (Re: More Comments ondraft-ietf-rtcweb-use-cases-and-requirements-07) from Randell Jesup on 2012-06-17 (public-media-capture@w3.org from June 2012)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Sun, 17 Jun 2012 12:32:43 -0400
To: rtcweb@ietf.org, "public-media-capture@w3.org" <public-media-capture@w3.org>, Robert O'Callahan <roc@ocallahan.org>
Message-ID: <4FDE06AB.9030205@jesup.org>
CC'd to both lists, as this affects both Media Capture (MediaStreams) 
and rtcweb protocol issues (and PeerConnection API issues, but I'm not 
yet including the WebRTC list).

On 6/17/2012 10:33 AM, Stefan Hakansson LK wrote:
> For the fun of it I thought a little while about what options that are
> currently available with the APIs that are in the W3C docs as is. The
> list is in no way exhaustive, there are other ways to do it.
>
> Perhaps this is not sufficient for what people want to do, but at least
> there are some things you can do already:
>
> 1. Replace video with the display of an image
> =============================================
> One solution: Use the “poster” attribute of the video element. Quote
> from the html spec: “The poster attribute gives the address of an image
> file that the user agent can show while no video data is available.”. We
> can spec it such that if no video track of a MediaStreamis active, that
> counts “as no data available”. This way, as soon as the user “mutes” the
> video, an image will replace it at display. The site can supply the
> image (either a default by the site or one “profile” pic selected by the
> user), or it can be provided over the data channel (as a blob).

This is a receiver-side action and thus is not very useful for Mute and 
Hold operations.  This would depend on those stopping transmitting 
(which could cause other issues) and would infer that lack of reception 
means mute or hold - while it could also mean a temporary or permanent 
loss of connectivity, etc.

> Another solution: A signal is sent on the data channel (possibly
> accompanied by the image) on the data channel, the peer’s app overlays
> the image on top of the video element as result (and switches back on
> another signal).

I've done this (using RTCP APP as a data channel), and it works well 
*if* both sides are the same application, and it still requires that the 
sources immediately mute (though black would be ok in this case). 
However, if you ever need to talk to something that doesn't understand 
this out-of-band "I'm muted" signal, then you really want to Mute/Hold 
with an image (and sound for Hold).  (And I've had to do this as well.)

> A third (similar) solution: The app at the receiving end listens to
> “onunmute” of the video track in question; once it fires an image is
> overlayed.

How does "onmute" work?  This similarly presupposes agreement on the 
out-of-band message, and also has the same issue when in a heterogeneous 
environment.

> A fourth: The peer’s app analyzes the video using canvas, and overlays
> an image when there is no data in the video.

a) ugh, what a waste of CPU cycles
b) Has the same problem as the first - the receiver doesn't know why 
it's getting no video (or black video).  Detecting no incoming video and 
overlaying makes sense (though timeouts can be tricky to avoid false 
positives), but it's not a replacement for Mute and Hold from the source.

> 2. Switch between front and rear camera
> ========================================
> One solution: create a MediaStream (e.g. by calling getUserMedia twice
> immediately after each other, or by combining existing tracks into a new
> MediaStream) containing two video tracks (front+back cam). The video
> shown is toggled by enabling/disabling video tracks (the video element
> plays the first active one in the case of several video tracks).

This means toggling would require a renegotiation end-to-end before the 
switch occurs.  Massively annoying (delay, perhaps long) and means more 
complexity required for receiving - I should be able to switch cameras 
and your simple receiver app shouldn't need to do a bunch of footwork.

> Another: use add/removeTrack to switch video track being sent.

Ditto.

> 3. Play music on hold
> =====================
> One solution (audio only case): The site provides an URL to a (music)
> file to play when on hold. The peer’s app registers an event listener
> that listens to “onmute” on the incoming audio track. When “onmute”
> fires, the app changes the source of the audio element to the URL of the
> music file.

Again, this depends on the "onmute" notification somehow getting across, 
and there may be a silence gap until it does (unless it's sent inband in 
the RTP somehow (header extensions?), and then there are redundancy needs.

If one were developing a single, non-heterogeneous app one might do 
this, but relying on this in a heterogeneous world has problems.  (For 
example, connection to a PSTN gateway - you'd have to build this 
behavior into the WebRTC<->PSTN gateway and tightly specify it at the 
rtcweb level.)  This amounts to build a specification for an 
application, not for a set of tools and capabilities others can use to 
build apps.

> One solution if the communication is audio+video, and the intent is to
> just replace the audio (but continue playing video): assume a
> MediaStream with one audio and one video track is used. Play the audio
> in an audio element, the video in a (muted) video element. At on hold,
> replace the source of the audio element with the music file.

stream = getUserMedia();
audio.src = stream;
video.src = stream;
video.mute();
audiostream = audio.captureStreamUntilEnded();
videostream = video.captureStreamUntilEnded();
mergedstream = new MediaStream(audio track from audiostream, video track 
from videostream);
pc.addStream(mergedstream);

onmute: audio.src = music;

Problem (other than this is a bit painful and wasteful): the audio and 
video streams are now unsynchronized.

> A solution if the communication is audio+video, and the intent is to
> replace audio and video: either “mute” the tracks (at sending side);
> receiving end gets “onmute” event and replaces the .src of the video
> element with the URL to a video file, or signal on the data channel
> something that makes the other end replace the src of the video element
> with an URL to a file

There's no transmittal of these "onmute" events you're talking about 
across the RTP layer.

See the previous arguments about receiver-side display, timing, etc.

>
> Another solution is that the sending end does “removeStream” from
> PeerConnection. This fires “onremovestream” at the receiving side; that
> event can trigger the change of src to the video element. This solution
> frees up resources.

This requires a full renegotiation (which causes onremovestream to fire 
once complete).

>
> Note that once the recording function has been specced (in such a way
> that the recorded file can be the source of a MediaStream) more options
> will become available.

I'm already assuming something like that 
(video.captureStreamUntilEnded()) - see the MediaStream Processing API 
proposal.


-- 
Randell Jesup
randell-ietf@jesup.org
Received on Sunday, 17 June 2012 16:33:40 UTC