RE: Issue 175: Simulcasting to RtpReceiver and switched stream rapid switches and clipping

Robin said: 

"Possible solutions:
(a) Have a set of timing rules that can be applied to determine scenario A versus B to resolve the ambiguity.
(b) Render in two (or more) hidden RtpReceivers with individual tracks being output from each where the simulcast RtpReceiver is the rendered output of the combined audio for audio and the active video for video.
(c) Do not allow simulcasting and require separate RtpReceivers where the Media Stream Tracks indicate their activity (active/inactive) state allowing switching from an application between the streams (as well as sending all audio to render so that it doesn't matter which stream is output).
(d) Do a simple method of "last packet wins" and watch the jitter happen :)"

[BA] The scenario where an RtpReceiver is receiving multiple SSRCs occurs whenever the Multiple RTP Stream Single Transport (MRST) transport mode is used - this applies not just to simulcast, but also to scalable video coding. 

Since in MRST each simulcast and/or SVC stream uses a distinct SSRC, there are distinct sequence number spaces involved.   Due to re-ordering,  there may be an interleaving of packets from the different SSRCs.  Even if there was no re-ordering of the initial streams, if a retransmission stream is required to recover lost packets, then there can be interleaving of the re-transmission streams.  For SVC, video codecs supporting MRST transport need the Decoding Order Number (DON) field (used in H.264/SVC and H.265) or an equivalent to sequence packets for the decoder.  For simulcast there is no equivalent of the DON field.  

To support MRST, the RtpReceiver needs to be able to assemble frames arriving on distinct SSRCs, up to the maximum number of streams it can support.   For example, this could be up to 2 simulcast streams, each with up to 3 temporal layers (6 total SSRCs).  With SVC operating in MRST transport mode, the RtpReceiver can use the DON field to order the packets for the decoder.   For simulcast, the RtpReceiver needs to be able to assemble incoming packets from each of the simulcast streams, while attempting to repair missing packets using re-transmission and/or FEC streams.  Switching between simulcast streams within the RtpReceiver occurs once the frame before the switch is passed to the decoder (or it is determined that lost packets are unrecoverable).  So inherently, there is buffering involved (and timing rules, to figure out when to give up on the last frame before the switch).  

It therefore seems to me that a) is the only viable solution - but the specific timing rules are likely to depend on the codec as well as the implementation. 

Option b) doesn't appeal to me - I don't think we want to require support for audio mixing in an RtpReceiver, especially since you can always create distinct RtpReceiver objects.  

Option c) is also unappealing - support for simulcast/SVC is one of the core features of ORTC API. 

Option d) will perform very badly - the first video frame after a switch is often an I-frame comprised of many packets.  So if there is interleaving from the frame before the switch and "last packet wins" is applied, both the frame before the switch and the one after it could be lost. 

Received on Saturday, 7 February 2015 18:02:04 UTC