Re: Handling simulcast from Harald Alvestrand on 2013-09-06 (public-webrtc@w3.org from September 2013)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Fri, 06 Sep 2013 11:17:28 +0200
To: public-webrtc@w3.org
Message-ID: <52299DA8.3050404@alvestrand.no>
On 09/06/2013 10:33 AM, Stefan Håkansson LK wrote:
> On 2013-09-05 19:33, Martin Thomson wrote:
>> There was a question about how to do simulcast on the call.  Here's
>> how it might be possible to do simulcast without additional API
>> surface.
>>
>> 1. Acquire original stream containing one video track.
>> 2. Clone the track and rescale it.
>> 3. Assemble a new stream containing the original and the rescaled track.
>> 4. Send the stream.
>> 5. At the receiver, play the video stream.
>>
>> That's the user part, now for the under-the-covers stuff:
> One use of simulcast I had in mind was for the usual multiparty, with
> central node, conferencing case where the active speaker is shown in a
> large video window (with thumbnail videos for others).
>
> For this case I think we already have the basic things needed:
>
> * each participant sends high and low resolution video of the same scene
> (as you outline above)
> * the central node forwards the low, or high resolution, version based
> on active speaker decision
>
> What is missing is the possibility to stop sending the high (or low)
> resolution video from the end-point to the central node if it is not
> forwarded to anyone. This would basically be to save transmission, and
> we would need pause/resume, but as others have pointed out a video track
> can be disabled (which would lead to encoding blackness) which also
> saves a lot of bits.
>
> To handle the case you describe below we would need to add some kind of
> meta data to the track, but it does not seem that hard to do.

If we want to allow simulcast to be implemented at application level, it 
seems to me that signalling which tracks should be disabled at the relay 
is also an application level issue, and doesn't need standardization.

As long as the communicating participants have the identifiers they need 
to identify the tracks and streams involved (<msid...>), they can send 
metadata outside of any standardized interfaces.
>
>> I know we discussed the rendering of multiple video tracks in the
>> past, but it's not possible to read the following documents and reach
>> any sensible conclusions:
>> http://dev.w3.org/2011/webrtc/editor/getusermedia.html
>> http://www.w3.org/TR/html5/embedded-content-0.html#concept-media-load-resource
>>
>> What needs to happen in this case is to ensure that the two video
>> tracks are folded together with the higher "quality" version being
>> displayed and the lower "quality" version being used to fill in any
>> gaps that might appear in the higher "quality" one.
>>
>> That depends on the <video> element being able to identify the tracks
>> as being equivalent, and possibly being able to identify which is the
>> higher quality.  This is where something like the srcname proposal
>> could be useful
>> (http://tools.ietf.org/html/draft-westerlund-avtext-rtcp-sdes-srcname-02).
I'm not sure srcname helps at all. We already know that the video tracks 
are in the same stream, and that both are enabled. If simulcast is 
implemented as an application-layer function, the user agent can have 
the interpretation that these are versions of the same stream.

The completely manual, JS-driven approach is to wait for the "error" 
event described under "if the media data is corrupted", use that to 
disable the current video stream, and enable a backup stream.

What we could do, hypothetically, to automate this is to suggest that 
the HTML5 media load algorithm be changed - add a step on video playback 
that says something like:

"If multiple video streams are present in the resource, and the 
currently selected video stream does not provide data that allows a 
picture to be rendered at that time, the user agent may switch to the 
next enabled video stream in the resource for the duration of the lack 
of data".

The unsolved problem here is to make sure the user agent picks the 
streams in the right order.

>>
>> The only missing piece is exposing metadata on tracks such that this
>> behaviour is discoverable.  Adding an attribute on tracks (srcname
>> perhaps, arbaon), could provide a hook for triggering the folding
>> behaviour I'm talking about.
>>
>>
>
Received on Friday, 6 September 2013 09:17:58 UTC