Re: Simulcast V1

In general, I think simulcast is too big to fit into 1.0.  The SDP work is
not that close to completion, and the controls that applications wants
would be big enough API points to warrant being part of 1.1, not 1.0.


On Thu, Aug 13, 2015 at 4:03 PM, Adam Roach <abr@mozilla.com> wrote:

> I've been involved in a number of recent conversations around simulcast
> for WebRTC, and a several implementors have indicated that it's an
> important feature for the initial release of WebRTC.
>
> As I understand the state of play:
>
>    - Chrome has a form of simulcasting implemented using undocumented SDP
>    mangling
>
> ​Correct, although it's very simple to document.

​a=ssrc-group:SIM 11111 22222 33333​
​

>
>    - Firefox has no simulcasting implemented, but will soon
>
> Are you planning on an SDP-based approach?


>
>    - The WebRTC 1.0 API has no simulcast-related controls whatsoever
>
> We haven't added the controls, but we've left a specific place to add
them. The whole reason we made RtpParameters inside of
RtpSender.setParameters contain a sequence of RtpEncodingParameters was
precisely to leave room for having more than one encoding per RtpSender
object.  I think that would be the most logical place to add them.  Which
is exactly what ORTC has done.  But I think getting those controls right is
big enough to push to 1.1.


>
>    - The IETF MMUSIC working group is nearing completion on a document
>    (draft-ietf-mmusic-sdp-simulcast-01) that allows negotiation of simulcast
>    in SD
>
>
The state of "simulcast in SDP" is not as far along as that, I don't think.
  We had some conversations between meetings about how to approach it, and
the current consensus among those in those conversations was to abandon the
PT-based approach and try again at a "new ID" approach, something like
"a=ESID".  But obviously that's going to take some time to get worked out.
As much as I'd like to see a good approach to simulcast in SDP (and I'm in
favor of "a=ESID"), I doubt we'll be able to finish in the timeframe that
we want to finish WebRTC 1.0.​



> I also understand and sympathize with the goal to stop adding any
> non-trivial modifications to the existing WebRTC spec, so that we can
> finally publish an initial version of the document.
>
> At the same time, the vast majority of the use cases that make sense for
> simulcast involve browsers talking to an MCU (or similar server), sending
> multiple encodings per track in the browser-to-MCU direction, but receiving
> only one encoding per track in the MCU-to-browser direction.
>
> This is interesting, because it means that we don't really require any
> controls that indicate the desire for a browser to *receive* simulcast --
> all we need is the ability to indicate a willingness to send it. At the
> same time, the MCU will know what resolutions (and other variations) it
> wants to receive, and can inform the browser of this information via SDP.
>
​Actually, we don't need any new API points at all.  This can be
accomplished with just the MCU putting the right simulcast bits in the SDP
that gets passed down into SetRemoteDescription, or via SDP munging in JS
for the SDP that gets passed down into SetLocalDescription (depending on
whether we choose to put the sending simulcast params in the local or
remote description).  If it can be done with SDP, that's all the API you
need.  If you don't like SDP as an API, we have a long list of API points
that we could add, and I wouldn't put simulcast at the top (I'd put codec
selection).


> Based on the foregoing, then, I propose that we instead add a trivial
> control to the existing RTCRtpSender objects. My strawman proposal would be
> something like:
>
>
> ------------------------------
>
> partial interface RTCRtpSender {
>   attribute unsigned short maxSimulcastCount;
> };
>
> maxSimulcastCount of type unsigned short
>
> This attribute controls the number of simulcast streams that will be
> offered for the specific RTCRtpSender. The actual number of streams used
> for this sender will depend on the answer that is passed to
> setRemoteDescription.
>
> ------------------------------
>
> Here's how that would work (I'm going to use simulcast with two encodings
> for my examples, but extrapolating use for more streams than that should be
> obvious).
>
> If the browser is the entity creating the offer, the script driving its
> side of stuff would (for any streams it wants to support simulcast) set:
>
>   rtpSender.maxSimulcastCount = 2;
>

​This is basically just a way to tell the browser "please generate​ SDP
with simulcast stuff in it so that the MCU can send me back SDP with
simulcast stuff in it".  If we solve how to put simulcast in SDPe, why not
just make the MCU send back such SDP with simulcast in it?  Why bother
making the browser do anything in createOffer or setLocalDescription?

I'm actually in favor of not having SDP be involved at all.  Just let the
JS call RtpSender.setParameters() with multiple encodings and leave SDP out
of it.  But if we do involve SDP, I don't see why we need another API point
beyond multiple RtpEncodingParameters.



>
> The SDP that it gets from a subsequent createOffer would include two
> simulcast PTs. Both would have identical imageattrs, indicating the range
> of encodings supported for simulcast. Only one would be supported for recv
> (this is just the resulting m-line):
>
>    m=video 49300 RTP/AVP 97 98
>    a=rtpmap:97 H264/90000
>    a=rtpmap:98 H264/90000
>    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
>    a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000
>    a=imageattr:97 send [x=[128:16:1280],y=[72:9:720]] recv [x=[128:16:1280],y=[72:9:720]]
>    a=imageattr:98 send [x=[128:16:1280],y=[72:9:720]]
>    a=simulcast send 97;98 recv 97
>
>
> The MCU would then communicate actual desired resolutions using imagattr
> "recv" in its answer:
>
>    m=video 49674 RTP/AVP 97 98
>    a=rtpmap:97 H264/90000
>    a=rtpmap:98 H264/90000
>    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
>    a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
>    a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720]
>    a=imageattr:98 recv [x=320,y=180]
>    a=simulcast recv 97;98 send 97
>
>
> ------------------------------
>
> Conversely, if the MCU were creating the offer, it would include the
> simulcast resolutions in the offer:
>
>    m=video 49674 RTP/AVP 97 98
>    a=rtpmap:97 H264/90000
>    a=rtpmap:98 H264/90000
>    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
>    a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
>    a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720]
>    a=imageattr:98 recv [x=320,y=180]
>    a=simulcast recv 97;98 send 97
>
>
> When the receiving JavaScript calls setRemoteDescription, the
> maxSimulcastCount on the corresponding sender(s) would be automatically
> updated according to the number of encodings indicated for each video
> m-line. And, of course, the answer created by createAnswer would similarly
> contain simulcast information matching the number of desired encodings from
> the offer:
>
>    m=video 49300 RTP/AVP 97 98
>    a=rtpmap:97 H264/90000
>    a=rtpmap:98 H264/90000
>    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
>    a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000
>    a=imageattr:97 send [x=1280,y=720] recv [x=[320:16:1280],y=[180:9:720]]
>    a=imageattr:98 send [x=320,y=180]
>    a=simulcast send 97;98 recv 97
>
>
> ------------------------------
>
> I think this satisfies a broad range of simulcast use cases with very
> little impact on the 1.0 API. I'll also note that this is intended to be a
> first-pass of simulcast implementation; if we find that other use cases
> arise that would benefit from more granular controls, we could easily add
> them in post-1.0 systems in a way that I believe could easily be backwards
> compatible with the scheme I describe above.
>
> --
> Adam Roach
> Principal Platform Engineer
> abr@mozilla.com
> +1 650 903 0800 x863
>

Received on Tuesday, 18 August 2015 00:07:58 UTC