- From: Adam Roach <abr@mozilla.com>
- Date: Thu, 13 Aug 2015 18:03:14 -0500
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
- Message-ID: <55CD2232.5090507@mozilla.com>
I've been involved in a number of recent conversations around simulcast for WebRTC, and a several implementors have indicated that it's an important feature for the initial release of WebRTC. As I understand the state of play: * Chrome has a form of simulcasting implemented using undocumented SDP mangling * Firefox has no simulcasting implemented, but will soon * The WebRTC 1.0 API has no simulcast-related controls whatsoever * The IETF MMUSIC working group is nearing completion on a document (draft-ietf-mmusic-sdp-simulcast-01) that allows negotiation of simulcast in SDP I also understand and sympathize with the goal to stop adding any non-trivial modifications to the existing WebRTC spec, so that we can finally publish an initial version of the document. At the same time, the vast majority of the use cases that make sense for simulcast involve browsers talking to an MCU (or similar server), sending multiple encodings per track in the browser-to-MCU direction, but receiving only one encoding per track in the MCU-to-browser direction. This is interesting, because it means that we don't really require any controls that indicate the desire for a browser to /receive/ simulcast -- all we need is the ability to indicate a willingness to send it. At the same time, the MCU will know what resolutions (and other variations) it wants to receive, and can inform the browser of this information via SDP. Based on the foregoing, then, I propose that we instead add a trivial control to the existing RTCRtpSender objects. My strawman proposal would be something like: ------------------------------------------------------------------------ partial interface RTCRtpSender { attribute unsigned short maxSimulcastCount; }; maxSimulcastCount of type unsigned short This attribute controls the number of simulcast streams that will be offered for the specific RTCRtpSender. The actual number of streams used for this sender will depend on the answer that is passed to setRemoteDescription. ------------------------------------------------------------------------ Here's how that would work (I'm going to use simulcast with two encodings for my examples, but extrapolating use for more streams than that should be obvious). If the browser is the entity creating the offer, the script driving its side of stuff would (for any streams it wants to support simulcast) set: rtpSender.maxSimulcastCount = 2; The SDP that it gets from a subsequent createOffer would include two simulcast PTs. Both would have identical imageattrs, indicating the range of encodings supported for simulcast. Only one would be supported for recv (this is just the resulting m-line): m=video 49300 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=rtpmap:98 H264/90000 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000 a=imageattr:97 send [x=[128:16:1280],y=[72:9:720]] recv [x=[128:16:1280],y=[72:9:720]] a=imageattr:98 send [x=[128:16:1280],y=[72:9:720]] a=simulcast send 97;98 recv 97 The MCU would then communicate actual desired resolutions using imagattr "recv" in its answer: m=video 49674 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=rtpmap:98 H264/90000 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720] a=imageattr:98 recv [x=320,y=180] a=simulcast recv 97;98 send 97 ------------------------------------------------------------------------ Conversely, if the MCU were creating the offer, it would include the simulcast resolutions in the offer: m=video 49674 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=rtpmap:98 H264/90000 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720] a=imageattr:98 recv [x=320,y=180] a=simulcast recv 97;98 send 97 When the receiving JavaScript calls setRemoteDescription, the maxSimulcastCount on the corresponding sender(s) would be automatically updated according to the number of encodings indicated for each video m-line. And, of course, the answer created by createAnswer would similarly contain simulcast information matching the number of desired encodings from the offer: m=video 49300 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=rtpmap:98 H264/90000 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000 a=imageattr:97 send [x=1280,y=720] recv [x=[320:16:1280],y=[180:9:720]] a=imageattr:98 send [x=320,y=180] a=simulcast send 97;98 recv 97 ------------------------------------------------------------------------ I think this satisfies a broad range of simulcast use cases with very little impact on the 1.0 API. I'll also note that this is intended to be a first-pass of simulcast implementation; if we find that other use cases arise that would benefit from more granular controls, we could easily add them in post-1.0 systems in a way that I believe could easily be backwards compatible with the scheme I describe above. -- Adam Roach Principal Platform Engineer abr@mozilla.com +1 650 903 0800 x863
Received on Thursday, 13 August 2015 23:03:44 UTC