Re: Handling simulcast from Stefan Håkansson LK on 2013-09-06 (public-webrtc@w3.org from September 2013)

From: Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com>
Date: Fri, 6 Sep 2013 08:33:42 +0000
To: Martin Thomson <martin.thomson@gmail.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <1447FA0C20ED5147A1AA0EF02890A64B1C384E8E@ESESSMB209.ericsson.se>

On 2013-09-05 19:33, Martin Thomson wrote:
> There was a question about how to do simulcast on the call.  Here's
> how it might be possible to do simulcast without additional API
> surface.
>
> 1. Acquire original stream containing one video track.
> 2. Clone the track and rescale it.
> 3. Assemble a new stream containing the original and the rescaled track.
> 4. Send the stream.
> 5. At the receiver, play the video stream.
>
> That's the user part, now for the under-the-covers stuff:

One use of simulcast I had in mind was for the usual multiparty, with 
central node, conferencing case where the active speaker is shown in a 
large video window (with thumbnail videos for others).

For this case I think we already have the basic things needed:

* each participant sends high and low resolution video of the same scene 
(as you outline above)
* the central node forwards the low, or high resolution, version based 
on active speaker decision

What is missing is the possibility to stop sending the high (or low) 
resolution video from the end-point to the central node if it is not 
forwarded to anyone. This would basically be to save transmission, and 
we would need pause/resume, but as others have pointed out a video track 
can be disabled (which would lead to encoding blackness) which also 
saves a lot of bits.

To handle the case you describe below we would need to add some kind of 
meta data to the track, but it does not seem that hard to do.

>
> I know we discussed the rendering of multiple video tracks in the
> past, but it's not possible to read the following documents and reach
> any sensible conclusions:
> http://dev.w3.org/2011/webrtc/editor/getusermedia.html
> http://www.w3.org/TR/html5/embedded-content-0.html#concept-media-load-resource
>
> What needs to happen in this case is to ensure that the two video
> tracks are folded together with the higher "quality" version being
> displayed and the lower "quality" version being used to fill in any
> gaps that might appear in the higher "quality" one.
>
> That depends on the <video> element being able to identify the tracks
> as being equivalent, and possibly being able to identify which is the
> higher quality.  This is where something like the srcname proposal
> could be useful
> (http://tools.ietf.org/html/draft-westerlund-avtext-rtcp-sdes-srcname-02).
>
> The only missing piece is exposing metadata on tracks such that this
> behaviour is discoverable.  Adding an attribute on tracks (srcname
> perhaps, arbaon), could provide a hook for triggering the folding
> behaviour I'm talking about.
>
>

Received on Friday, 6 September 2013 08:34:06 UTC