Re: A proposal for how we would use the SDP that comes out of the MMUSIC interm from Peter Thatcher on 2015-10-09 (public-webrtc@w3.org from October 2015)

From: Peter Thatcher <pthatcher@google.com>
Date: Fri, 9 Oct 2015 10:56:00 -0700
To: Byron Campen <docfaraday@gmail.com>
Cc: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <CAJrXDUHdZ3n-WnWVLuCbewy8wTRP1c1NY+EKt1gC_z8Zsm22Mg@mail.gmail.com>
On Fri, Oct 9, 2015 at 9:14 AM, Byron Campen <docfaraday@gmail.com> wrote:

> On 10/8/15 9:06 PM, Peter Thatcher wrote:
>
> Heading into the MMUSIC interim, I think it would be wise to have some
> idea of what SDP we would want to use (what subset of what MMUSIC decides
> on would WebRTC support?).  Having spoken to several people in the MMUSIC
> WG working on the draft for simulcast and several people in the WebRTC WG,
> I've come up with the following proposal for what subset we could use, and
> what the API around it would look like.  I hope this proposal and ensuing
> discussion help us prepare for the interim.
>
>
> At the f2f, we had several ideas with different qualities which I describe
> her in JSON form.
>
> Plan A: {api: little, sdp:big}
> Plan B: {api: big,    sdp: none}
> Plan C: {api: none,   sdp:none}
>
> This proposal (let it be Plan X) tries to combine the best of both and
> have the qualities {api: moderate, sdp: moderate}.
>
>
> Here is a graphical representation of the qualities:
> +
> API Big | B
> |
> |
> |
> |
> |
> |
> |
> |
> | X
> |
> |
> |
> |
> |
> |
> API Small | A
> | C
> +------------------------------------+
> SDP Small SDP Big
>
>
>
> Here's how it would work with an example:
>
> var video = pc.addTransceiver(track,
>   {send: {encodings: [{scale: 1.0}, {scale: 2.0}, {scale: 4.0}]}});
>
> // This now has a *subset* of the SDP from MMUSIC
> pc.createOffer().then(signalOffer)
> var answer = ... ; //  wait for the answer
>
> // This accepts a *subset* of the SDP from MMUSIC
> pc.setRemoteDescription(answer);
>
> // The app can later decide to change parameters, such as
> // stop sending the top layer
> var params = video.sender.getParameters();
> params.encodings[0].active = false;
> video.sender.setParameters(params);
>
>
> The key parts are:
> 1.  It builds on top of addTransceiver, expanding {send: true} to {send:
> ...} where the ... can express the desire for simulcast.
> 2.  It uses a subset of the SDP from MMUSIC in the offer and answer.
> 3.  It gives JS some control over each layer: .active, .maxBandwidth,
> .scale.
> 4.  The additions to the API and to the SDP are simple.
>
>
>
> Here's the WebIDL:
>
> dictionary RTCRtpTransceiverInit {
>   (boolean or RTCRtpParameters) send;
>   // .. the rest as-is
> }
>
> dictionary RTCRtpEncodingParameters {
>   double scale;  // Resolution scale
>   unsigned long rsid;  // RTP Source Stream ID
>   // ... the rest as-is
> }
>
>    I am skeptical that a resolution scale is the right tool for
> "full/postage stamp", which is the primary use case for simulcast.
>  
> A conferencing service is probably going to want to define the postage
> stamp as a fixed resolution (and probably framerate), not a scale of the
> full resolution that can slide around.
>

Ultimately, it's the client-side Javascript in control of what gets sent to
the server.  The big question has always been: does the JS specify a fixed
resolution

(or height or width) or a relative one?  All of the discussions we've had
in the past in the WebRTC working group about this have always ended up in
favor of relative, and not fixed one.   If the JS wants to send a specific
resolution, it can control that on the track, not via the
RtpEncodingParameters or the SDP.

As for what conferencing services want, the one I'm very familiar with
wants a resolution scale.  So at least the desire for a fixed resolution
isn't universal.

  And, as I already mentioned, services that do want a fixed resolution
can send a fixed resolution from the JS via track controls.


But even if we did say "RTCRtpEncodingParameters should have a .maxWidth
and a .maxHeight", which I doubt we will, that's somewhat orthogonal to
this proposal.  The main part of this proposal is:

1.  Add some controls to addTransceiver to indicate the desire to send
simulcast.
2.  Use a subset of MMUSIC SDP

We can decide what controls go in RTCRtpEncodingParameters separately.



And here's the *subset* of the SDP from MMUSIC we could use in the offer
> (obviously subject to change based on the results of the interim):
>
> m=video ...
> ...
> a=rsid send 1
> a=rsid send 2
> a=rsid send 3
> a=simulcast rsids=1,2,3
>
>    The semantics of this are pretty unclear; what does each of these rids
> mean? You can say that it is "application dependent" I suppose, but the
> implementers of conferencing servers are going to want something a little
> more concrete than that.
>

If the JS wants to send more information about what semantics it is giving
to each encoding/rsid, it is more than capable of doing so in its
signalling to the server.  We don't need to put all signalling into SDP.
We may choose, for convenience of the JS, to put a minor amount of
signalling in the JS, like we put the track ID into the SDP.  If so, what
you're really advocating for is an RSID that's a string instead of an int:

addTrasceiver("video", {sendEncodings: [{rsid: "big"}, {rsid: "medium",
resolutionScale: 2.0}, {rsid: "small", resolutionScale: 4.0}]});



m=video ...
...
a=rsid send
small
a=rsid send
medium
a=rsid send
large
a=simulcast rsids=
small
,
medium,large


That would provide a convenient way for JS 

to put some semantic meaning in the RSID, at the cost of potentially a
slightly larger header extension value.



I'd be happy either way: with a string RSID or an int RSID where the JS
sends encoding metadata to the server on its own.  But I don't think it's a
good idea to add anything to the SDP outside of the RSID.


I would point out, though, that the conference server really doesn't need
to know anything about the RSIDs.  It already knows the sizes of the layers
from the media itself and can just use that to choose what to forward.  It
just needs the RSIDs to know which FEC or RTX stream goes with which layer.




> If you want something very simple, perhaps something like this would be
> palatable:
>
> a=rid:1 send pt=*;resolution-scale=1
> a=rid:2 send pt=*;resolution-scale=0.5
> a=rid:3 send pt=*;resolution-scale=0.25
> a=simulcast:send rid=1;2;3
>
>    Then, depending on the configuration of the conferencing server, it
> could reply with something like
>
> a=rid:1 recv pt=*;resolution-scale=1
> a=rid:2 recv pt=*;resolution-scale=0.125
> a=simulcast:recv rid=1;2
>

I think it would be a mistake to put resolution-scale into the SDP, for
various reasons:
1.  It's overkill for what we want.  It's just a little information, more
like a label, being sent to the receive side.  It's not something
negotiated.
2.  It would imply that calling RtpSender.setParameters with a different
resolutionScale would cause renegotiation.
3.  It probably doesn't make sense for applications outside of WebRTC, and
that would only complicate the work to do in MMUSIC.
4.  It would imply the answerer can change it.



>    This does a good job of indicating to the other end what the difference
> is between the rids, is pretty simple, maps pretty well to the W3C API
> surface, and allows enough flexibility for a conferencing server to ask for
> what it actually needs. It would require an additional "resolution-scale"
> rid parameter, and its semantics would need to be specced out. As I
> mentioned before, I'm not sure we should be using a resolution scale for
> this in the first place, but maybe that can be fixed later.
>
>
>
> And here's the *subset* of the SDP from MMUSIC we could use in the answer:
> m=video ...
> ...
> a=rsid recv 1
> a=rsid recv 2
> a=rsid recv 3
> a=simulcast rsids=1,2,3
>
>
> That's it.  That's all I think we need: a simple addition to
> addTransceiver plus a simple subset of the SDP from MMUSIC.
>
>
> The last thing I would note is that I propose we *do not* use the entirety
> of the MMUSIC draft in WebRTC.  In particular, not the PT overloading or
> the more extensive attributes that don't map well to
> RTCRtpEncodingParameters (max-width, max-height, max-fps, max-fs,
> max-pps).
>
>     We can subset the rid params supported pretty easily (since the rid
> draft will end up allowing this), and if we can meet our needs by doing
> this, ok.
>

It's the a=simulcast line that matters, not the a=rsid line, so a separate
RSID draft isn't enough.  



> Regarding PT demux, it is fine if we do not support sending an offer with
> PT demux, but if a conferencing server sends us an offer that uses PT
> demux, we kinda have to play along.
>  
> mmusic is not going to be happy if we say we'll reject such an offer in
> rtcweb-jsep.
>

What goes out in the offer matters very little.  It's what goes out as RTP
that really matters.  And saying that the conference server can request
PT-based simulcast is 99% the way there to saying PT-based simulcast is
fully required 

in WebRTC 1.0.  And I'm opposed to that.  

If that's a requirement for simulcast in WebRTC 1.0, then I'd rather have
simulcast not be in 1.0.
  

I think it is a valid choice to say WebRTC will only use a non-PT-based
subset of the MMUSIC work.  There's *a lot* of SDP out there that WebRTC
doesn't support, and PT-based simulcast would just be one more of those
things.


>
> Best regards,
> Byron Campen
>
Received on Friday, 9 October 2015 17:57:08 UTC