Re: WebRTC-SVC: "S" modes and simulcast using a single SSRC from Bernard Aboba on 2019-07-23 (public-webrtc@w3.org from July 2019)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Tue, 23 Jul 2019 06:37:48 +0000
To: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <D0A4430D-2B17-4ED1-BC7A-A3A6AB4554B0@microsoft.com>

On Jul 22, 2019, at 13:40, Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com<mailto:sergio.garcia.murillo@gmail.com>> wrote:

Hi Bernard,

I guess that the answers are very codec specific, I will answer from VP9 and AV1 point of view, but they may be different for other codecs (h265?)

On 22/07/2019 19:15, Bernard Aboba wrote:
In early drafts of the WebRTC-SVC Extension, "S" modes were included in the mode table but those modes were subsequentliy removed, under the assumption that each simulcast stream would be sent on a separate SSRC.

Recently the Issue has come up again, as part of the design of the AV1 RTP payload, since the AV1 bitstream specification supports simulcast:
https://github.com/AOMediaCodec/av1-rtp-spec/issues/51<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAOMediaCodec%2Fav1-rtp-spec%2Fissues%2F51&data=02%7C01%7CBernard.Aboba%40microsoft.com%7C8762223c6ff040fe573f08d70ee4c114%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636994247969189439&sdata=Ys0ZoTnpnl4ngD%2FKjIChx4d59BJAIHAo6xgqjAXyzZ0%3D&reserved=0>

Specifically, the following WebRTC-related questions have been asked:

1. What are the implications of enabling "S" modes for the WebRTC API? For example, would adding "S" modes back violate the requirement that setParameters() never set "negotiationneeded"?

VP9 send the scalability structure in rtp payload and AV1 rtp draft proposes to send the dependency descriptor as an rtp payload header. So in both cases it should be possible to change the SVC mode on the fly (even from no "S" mode to "S" modes and vice versa)

[BA] The problem occurs if single SSRC Simulcast is negotiated in SDP, because WebRTC-PC Section 5.2 says:

setParameters does not cause SDP renegotiation and can only be used to change what the media stack is sending or receiving within the envelope negotiated by Offer/Answer. The attributes in the RTCRtpSendParameters<https://w3c.github.io/webrtc-pc/#dom-rtcrtpsendparameters> dictionary are designed to not enable this, so attributes like cname that cannot be changed are read-only. Other things, like bitrate, are controlled using limits such as maxBitrate, where the user agent needs to ensure it does not exceed the maximum bitrate specified bymaxBitrate, while at the same time making sure it satisfies constraints on bitrate specified in other places such as the SDP.

The implication is that a change to or from “S” modes would not be allowed. Also, other functionality for multiple SSRC Simulcast might not be supportable, such as active/inactive and arbitrary scaleResolutionDownBy attribute values.

1. Would enabling "S" modes result in SDP changes? For example, would an Offer with "S" modes be different from one where simulcast is sent on separate SSRCs, but the SSRCs are not signaled in the SDP? If so, how?

A simulcast using different ssrcs would have one encoding parameter for each simulcast layer with a different rid each one, while the "S" modes will have a single encoding. So the differences in the SDP are not due to the "S" mode but due to the different number of encoding parameters.

[BA] As you say, an encoding corresponds to a single RID so an “S mode” setting on an encoding generates multiple simulcast layers with the same RID. That seems like it could confuse an SFU expecting each RID to represent a distinct simulcast layer. Also, I do not think that the SDP would indicate clearly what is being offered, so an Answerer could not determine how many simulcast streams are involved or what the RID mapping is.

Note that it would be possible to enable the crazy scenario of having multiple simulcast encodings each one using "S" mode, which will produce multiple ssrcs/rids, and each one contain multiple independent spatial layers within.

[BA] Yes, that is pretty crazy but I see no way for an Answerer to know that is being offered, nor a way for it to indicate it would prefer something else.

1. In an SDP Simulcast Offer where only RIDs are present and not SSRCs, how does the Answerer know whether the intent is to send simulcast on separate SSRCs or not?

If the offered sends an offer with only rids, the answerer should create one encoding parameter per rid (I think that was the final proposal we choose for accepting simulcast offers). Then, in each encoding parameter the answerer will be able to choose the simulcast mode (even an "S" mode as shown before)

[BA] Yes. But doesn’t having multiple simulcasts per RID make the RID useless?

1. For a conference server that supports RIDs, how much difference does it make whether simulcast is sent on multiple SSRCs or a single one? For example, RIDs would be present in each packet, but instead of RTCP messages for each SSRC, there would only be a single SSRC.

For VP9 the change is quite substantial IMHO, with no "S" modes, the up switch/down switch can be done just by checking a couple of bits of the rtp payload descriptor headers, while in "S" mode that to do it properly an SFU will have to properly parse the scalability structure, determine the layer dependencies and do the switching based on the received frames and the frame dependency. Iñaki has implemented VP9 K-SVC support that was added in latest chrome versions (I am still using 73 for production) so he might be able able to provide more info regarding this (although not sure if he parses the SS or if he detects the K-SVC mode based screencast/camera flag).

[BA] The Descriptor RTP extension might help determine the upswitch/down points and dependencies so it might make life bearable for the SFU even if all simulcast streams were on the same SSRC because RIDs would be essentially irrelevant.

In AV1, the dependency structure is provided in the rtp header extension, so will be much easier to implement. Anyway, as the SFU will be able to choose the SVC mode, each one of us will implement the one that fits bets for us.

Would be very interesting to understand the benefits found by google team that made them allow the K-SVC modes in some cases over non "S" modes in VP9.

Best regards

Sergio

Received on Tuesday, 23 July 2019 06:38:18 UTC