Re: A proposal for how we would use the SDP that comes out of the MMUSIC interm from Byron Campen on 2015-10-09 (public-webrtc@w3.org from October 2015)

From: Byron Campen <docfaraday@gmail.com>
Date: Fri, 9 Oct 2015 11:14:38 -0500
To: public-webrtc@w3.org
Message-ID: <5617E7EE.1070107@gmail.com>
On 10/8/15 9:06 PM, Peter Thatcher wrote:
> Heading into the MMUSIC interim, I think it would be wise to have some 
> idea of what SDP we would want to use (what subset of what MMUSIC 
> decides on would WebRTC support?).  Having spoken to several people in 
> the MMUSIC WG working on the draft for simulcast and several people in 
> the WebRTC WG, I've come up with the following proposal for what 
> subset we could use, and what the API around it would look like.  I 
> hope this proposal and ensuing discussion help us prepare for the interim.
>
>
> At the f2f, we had several ideas with different qualities which I 
> describe her in JSON form.
>
> Plan A: {api: little, sdp:big}
> Plan B: {api: big,    sdp: none}
> Plan C: {api: none,   sdp:none}
>
> This proposal (let it be Plan X) tries to combine the best of both and 
> have the qualities {api: moderate, sdp: moderate}.
>
>
> Here is a graphical representation of the qualities:
>                                                      
>                +                                     
>       API Big  | B                                   
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |              X                      
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>                |                                     
>     API Small  |                              A      
>                | C                                   
>                +------------------------------------+
>                                                      
>                  SDP Small                  SDP Big  
>                                                      
>
>
>
> Here's how it would work with an example:
>
> var video = pc.addTransceiver(track,
>   {send: {encodings: [{scale: 1.0}, {scale: 2.0}, {scale: 4.0}]}});
>
> // This now has a *subset* of the SDP from MMUSIC
> pc.createOffer().then(signalOffer)
> var answer = ... ; //  wait for the answer
>
> // This accepts a *subset* of the SDP from MMUSIC
> pc.setRemoteDescription(answer);
>
> // The app can later decide to change parameters, such as
> // stop sending the top layer
> var params = video.sender.getParameters();
> params.encodings[0].active = false;
> video.sender.setParameters(params);
>
>
> The key parts are:
> 1. It builds on top of addTransceiver, expanding {send: true} to 
> {send: ...} where the ... can express the desire for simulcast.
> 2. It uses a subset of the SDP from MMUSIC in the offer and answer.
> 3. It gives JS some control over each layer: .active, .maxBandwidth, 
> .scale.
> 4. The additions to the API and to the SDP are simple.
>
>
>
> Here's the WebIDL:
>
> dictionary RTCRtpTransceiverInit {
> (boolean or RTCRtpParameters) send;
> // .. the rest as-is
> }
>
> dictionary RTCRtpEncodingParameters {
>   double scale;  // Resolution scale
> unsigned long rsid;  // RTP Source Stream ID
> // ... the rest as-is
> }
    I am skeptical that a resolution scale is the right tool for 
"full/postage stamp", which is the primary use case for simulcast. A 
conferencing service is probably going to want to define the postage 
stamp as a fixed resolution (and probably framerate), not a scale of the 
full resolution that can slide around.
>
> And here's the *subset* of the SDP from MMUSIC we could use in the 
> offer (obviously subject to change based on the results of the interim):
>
> m=video ...
> ...
> a=rsid send 1
> a=rsid send 2
> a=rsid send 3
> a=simulcast rsids=1,2,3
    The semantics of this are pretty unclear; what does each of these 
rids mean? You can say that it is "application dependent" I suppose, but 
the implementers of conferencing servers are going to want something a 
little more concrete than that. If you want something very simple, 
perhaps something like this would be palatable:

a=rid:1 send pt=*;resolution-scale=1
a=rid:2 send pt=*;resolution-scale=0.5
a=rid:3 send pt=*;resolution-scale=0.25
a=simulcast:send rid=1;2;3

    Then, depending on the configuration of the conferencing server, it 
could reply with something like

a=rid:1 recv pt=*;resolution-scale=1
a=rid:2 recv pt=*;resolution-scale=0.125
a=simulcast:recv rid=1;2

    This does a good job of indicating to the other end what the 
difference is between the rids, is pretty simple, maps pretty well to 
the W3C API surface, and allows enough flexibility for a conferencing 
server to ask for what it actually needs. It would require an additional 
"resolution-scale" rid parameter, and its semantics would need to be 
specced out. As I mentioned before, I'm not sure we should be using a 
resolution scale for this in the first place, but maybe that can be 
fixed later.
>
>
> And here's the *subset* of the SDP from MMUSIC we could use in the answer:
> m=video ...
> ...
> a=rsid recv 1
> a=rsid recv 2
> a=rsid recv 3
> a=simulcast rsids=1,2,3
>
>
> That's it.  That's all I think we need: a simple addition to 
> addTransceiver plus a simple subset of the SDP from MMUSIC.
>
>
> The last thing I would note is that I propose we *do not* use the 
> entirety of the MMUSIC draft in WebRTC.  In particular, not the PT 
> overloading or the more extensive attributes that don't map well to 
> RTCRtpEncodingParameters (max-width, max-height, max-fps, max-fs, 
> max-pps).
>
     We can subset the rid params supported pretty easily (since the rid 
draft will end up allowing this), and if we can meet our needs by doing 
this, ok. Regarding PT demux, it is fine if we do not support sending an 
offer with PT demux, but if a conferencing server sends us an offer that 
uses PT demux, we kinda have to play along. mmusic is not going to be 
happy if we say we'll reject such an offer in rtcweb-jsep.

Best regards,
Byron Campen
Received on Friday, 9 October 2015 16:14:54 UTC