Re: WebRTC-SVC: "S" modes and simulcast using a single SSRC

On 23/07/2019 8:37, Bernard Aboba wrote:
> On Jul 22, 2019, at 13:40, Sergio Garcia Murillo 
> <sergio.garcia.murillo@gmail.com 
> <mailto:sergio.garcia.murillo@gmail.com>> wrote:
>
>>>
>>>  1. What are the implications of enabling "S" modes for the WebRTC
>>>     API? For example, would adding "S" modes back violate the
>>>     requirement that setParameters() never set "negotiationneeded"?
>>>
>> VP9 send the scalability structure in rtp payload and AV1 rtp draft 
>> proposes to send the dependency descriptor as an rtp payload header. 
>> So in both cases it should be possible to change the SVC mode on the 
>> fly (even from no "S" mode to "S" modes and vice versa)
>>
>
> [BA] The problem occurs if single SSRC Simulcast is negotiated in SDP, 
> because WebRTC-PC Section 5.2 says:
>
> |setParameters| does not cause SDP renegotiation and can only be used 
> to change what the media stack is sending or receiving within the 
> envelope negotiated by Offer/Answer. The attributes in the 
> ||RTCRtpSendParameters| 
> <https://w3c.github.io/webrtc-pc/#dom-rtcrtpsendparameters>| dictionary 
> are designed to not enable this, so attributes like |cname| that 
> cannot be changed are read-only. Other things, like bitrate, are 
> controlled using limits such as |maxBitrate|, where the user agent 
> needs to ensure it does not exceed the maximum bitrate specified 
> by|maxBitrate|, while at the same time making sure it satisfies 
> constraints on bitrate specified in other places such as the SDP.
>
> The implication is that a change to or from “S” modes would not be 
> allowed. Also, other functionality for multiple SSRC Simulcast might 
> not be supportable, such as active/inactive and arbitrary 
> scaleResolutionDownBy attribute values.


The difference between the S modes and the non S modes is just that the 
spatial layers are independent or not. For example both S3T1 and L3T1 
sends  3 spatial layers and only one temporal layer, but in the S mode 
each of the spatial layers are independent (i.e. simulcast-like) while 
in the L3T1 the depend on each other.

So, obviously we can't switch from a single S3T1 encoding parameter 
(simulcast with 3 spatial layers single ssrc) to 3 encoding parameters 
with L1T1 (simulcast with 3 ssrcs/rids), but it is not because we can't 
change from S mode to non-S mode, but because we can't change the number 
of encoding parameters of an rtp sender. However it will be perfectly 
valid to go from S3T1 to L3T1 and vice versa within the same encoding 
parameter.


>>
>>>  1. Would enabling "S" modes result in SDP changes? For example,
>>>     would an Offer with "S" modes be different from one where
>>>     simulcast is sent on separate SSRCs, but the SSRCs are not
>>>     signaled in the SDP? If so, how?
>>>
>> A simulcast using different ssrcs would have one encoding parameter 
>> for each simulcast layer with a different rid each one, while the "S" 
>> modes will have a single encoding. So the differences in the SDP are 
>> not due to the "S" mode but due to the different number of encoding 
>> parameters.
>>
> [BA] As you say, an encoding corresponds to a single RID so an “S 
> mode” setting on an encoding generates multiple simulcast layers with 
> the same RID. That seems like it could confuse an SFU expecting each 
> RID to represent a distinct simulcast layer. Also, I do not think that 
> the SDP would indicate clearly what is being offered, so an Answerer 
> could not determine how many simulcast streams are involved or what 
> the RID mapping is.

The scalability structure present on vp9 svc or dependency info in av1 
will tell the SFU what to do. I don't expect this to be a problem, as 
client apps are tightly coupled to SFUs.


>> Note that it would be possible to enable the crazy scenario of having 
>> multiple simulcast encodings each one using "S" mode, which will 
>> produce multiple ssrcs/rids, and each one contain multiple 
>> independent spatial layers within.
>>
> [BA] Yes, that is pretty crazy but I see no way for an Answerer to 
> know that is being offered, nor a way for it to indicate it would 
> prefer something else.
>>
>>
>>>  1.  In an SDP Simulcast Offer where only RIDs are present and not
>>>     SSRCs, how does the Answerer know whether the intent is to send
>>>     simulcast on separate SSRCs or not?
>>>
>> If the offered sends an offer with only rids, the answerer should 
>> create one encoding parameter per rid (I think that was the final 
>> proposal we choose for accepting simulcast offers). Then, in each 
>> encoding parameter the answerer will be able to choose the simulcast 
>> mode (even an "S" mode as shown before)
>>
> [BA] Yes. But doesn’t having multiple simulcasts per RID make the RID 
> useless?
Not saying that it would be useful for anyone, just that it will be a 
possible and valid configuration.


>
>>>  1. For a conference server that supports RIDs,  how much difference
>>>     does it make whether simulcast is sent on multiple SSRCs or a
>>>     single one?  For example, RIDs would be present in each packet,
>>>     but instead of RTCP messages for each SSRC, there would only be
>>>     a single SSRC.
>>>
>> For VP9 the change is quite substantial IMHO, with no "S" modes, the 
>> up switch/down switch can be done just by checking a couple of bits 
>> of the rtp payload descriptor headers, while in "S" mode that to do 
>> it properly an SFU will have to properly parse the scalability 
>> structure, determine the layer dependencies and do the switching 
>> based on the received frames and the frame dependency. Iñaki has 
>> implemented VP9 K-SVC support that was added in latest chrome 
>> versions (I am still using 73 for production) so he might be able 
>> able to provide more info regarding this (although not sure if he 
>> parses the SS or if he detects the K-SVC mode based screencast/camera 
>> flag).
>>
>
> [BA] The Descriptor RTP extension might help determine the 
> upswitch/down points and dependencies so it might make life bearable 
> for the SFU even if all simulcast streams were on the same SSRC 
> because RIDs would be essentially irrelevant.


Yes, just saying that for VP9, it was easier to just look at the up/down 
switch bits without having to deal with the scalability structure in 
deep. I have not thought in deep about the pros/cons of the S-modes vs 
rid for simulcast, I feel that they will solve some issues and make 
other harder, not sure what I would prefer to use yet.


Best regards

Sergio

Received on Tuesday, 23 July 2019 08:28:42 UTC