Proposal: One media RTP packet stream (i.e. one SSRC) per RTCRtpSender/RTCRtpReceiver object

Hello all,

We have been working on a new proposal to improve both RTCRtpSender and 
RTCRtpReceiver object in order to make ORTC spec cleaner and simpler.  I 
was not sure what was the preferred method of doing this proposals 
(either github issue or mailing list), so I created a gist for it:

https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7

Also I am copying the full proposal below:

Best regards
Sergio


  Rationale

IMHO it is quite difficult to understand what are the RTCRtpSender / 
RTCRtpReceiver.

     1. Overview

    In the figure above, the RTCRtpSender (Section 5) encodes the track
    provided as input, which is transported over a RTCDtlsTransport

    5.1 Overview

    The RTCRtpSender includes information relating to the RTP sender.

    5.1 Overview

    An RTCRtpSender instance is associated to a sending MediaStreamTrack
    and provides RTC related methods to it.

So, the sender, for example, is a generic object that takes a media 
track, and generates all short of RTP packets that you send to another 
peer. It can host one or several encoders, send one or more ssrcs, 
supporting simulcast and svc.

In that regards, it is quite similar to an RTCPeerconnection, but 
instead of using a SDP blob, you pass a kind of json-version of an 
m-line to the and the RTCRtpSender will do it's best to send all you 
want to send (given that you have correctly discovered all the 
restrictions to not cause an InvalidParameters exception).

So instead of matching a RTP-world object, as DTLS and ICETransport 
does, it is a kind of catch-all black-box object.

The RTCRtpReceiver shares same complexity, supporting receiving multiple 
ssrcs streams and payload types as they share the same RTCRtpParameters 
dictionary for setting up sending and receiving. It was recently 
suggested (and I agree with) that ORTC start off by supporting the 
WebRTC 1.0 simulcast model, which involves sending multiple streams, but 
receiving only one.

That implies that and RTCRtpReceiver will only receive one RTP packet 
stream (one ssrc) with one or more payolads (OPUS+dtmf for example). 
With this change, we can narrow up the definition of an RTCRtpSender and 
describe it as the object that handles the reception of a rtp packet 
stream. Again, IMHO, that makes much sense and maps concept in the 
draft-ietf-rtcweb-rtp-usage.

This proposal takes this idea further, and applies the same concept to 
the RTCRtpSender. Instead of allowing multiple rtp packets stream to be 
handled by a RTCRtpSender, we only allow one RTCRtpSender to produce a 
single rtp packet stream (ssrc).

Now we have a one to one relationship between an RTCRtpSender, an 
RTCRtpReceiver and a media RTP packet stream.

MediaTrack === > RTCRtpSender ========(single rtp packet stream - 
SSSRC)===> RTCRtpReceiver ===> MediaTrack

In that regards, following the m-line analogy, it would represent one 
ssrc-group.

Simulcast and SVC is also supported (check below).


  <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#benefits>Benefits

  * Improve RTCRtpSender/RTCRtpReceiver definitions
  * Cleaner and simpler APIs
  * Make it harder to have parameter inconsistency
  * Provide a single and straight forward way of using the API. Given a
    DTLS/ICE/RTP stream architecture there is only a single way of
    implementing it in ORTC.


  <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#proposal>Proposal

In order to be the less disruptive with the changes we have made only 
the following changes to the current API:

  * The main change is to move the ssrc, fec and rtx definition from the
    encodings to the rtp parameters.
  * Add RTCRtpCodecRTXParameters associated to each
    RTCRtpCodecParameters to support rtx apt issue (this change can be
    also be implemented standalone without the rest of the changes)
  * Removed the codec sequence from the parameters and move to the
    encoding one. This change could be removed, although we believe it
    is important in sake of clarity (more of it later)

//New dictionary
dictionary RTCRtpCodecRTXParameters {
              payloadtype               payloadType;
              unsigned long             rtxtime;
};

dictionary RTCRtpCodecParameters {
              DOMString                 name;
              payloadtype               payloadType;
              unsigned long             clockRate;
              unsigned long             maxptime;
              unsigned long             ptime;
              unsigned long             numChannels;
              sequence<RTCRtcpFeedback >  rtcpFeedback;
              Dictionary                parameters;
              RTCRtpCodecRTXParameters  rtx;// NEW: rtx.payloadType
};

//Not changed, just added here for completeness
dictionary RTCRtpRtxParameters {
              unsigned long ssrc;
              payloadtype   payloadType;
};

//Not changed, just added here for completeness
dictionary RTCRtpFecParameters {
              unsigned long ssrc;
              DOMString     mechanism;
};

dictionary RTCRtpParameters {
              DOMString                                 muxId=  "";
              unsigned long                             ssrc;//media ssrc - moved from encodings
              RTCRtpFecParameters                       fec;//includes fec.ssrc - moved from encodings
              RTCRtpRtxParameters                       rtx;//includes rtx.ssrc - from encodings
              sequence<RTCRtpHeaderExtensionParameters >  headerExtensions;
              sequence<RTCRtpEncodingParameters >         encodings;
              RTCRtcpParameters                         rtcp;
              RTCDegradationPreference                  degradationPreference=  "balanced";
              //Removed codecs sequence
};

dictionary RTCRtpEncodingParameters {
              RTCRtpCodecParameters codec;// Moved from parameters
              RTCPriorityType       priority;
              unsigned long         maxBitrate;
              double                minQuality=  0;
              double                resolutionScale;
              double                framerateScale;
              unsigned long         maxFramerate;
              boolean               active=  true;
              DOMString             encodingId;
              sequence<DOMString >    dependencyEncodingIds;
              //Removed ssrc fec rtx
};


  <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#impact-analisis>Impact
  analisis


    <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#normal-use-case-1-sender-1-receiver-1-media-codec>Normal
    use case (1 sender, 1 receiver, 1 media codec)

As we have removed the sequence of RTCRtpCodecParameters from the 
parameters, it is required to pass that information in the encodings 
attributes. So the automatic process that is performed internally by the 
RTCRtpSender in the current version for this case is not possible:

    the browser behaves as though a single encodings[0] entry was
    provided, with encodings[0].ssrc set to a browser-determined value,
    encodings[0].active set to "true", encodings[0].codecPayloadType set
    to codecs[j].payloadType where j is the index of the first codec
    that is not "cn", "dtmf", "red", "rtx", or a forward error
    correction codec, and all the other parameters.encodings[0]
    attributes unset.

However note that in the specification, all the examples uses the 
following helper function that perform the required steps:

|RTCRtpParameters function myCapsToSendParams(RTCRtpCapabilities 
sendCaps, RTCRtpCapabilities remoteRecvCaps) { // Function returning the 
sender RTCRtpParameters, based on the local sender and remote receiver 
capabilities. // The goal is to enable a single stream audio and video 
call with minimum fuss. // // Steps to be followed: // 1. Determine the 
RTP features that the receiver and sender have in common. // 2. 
Determine the codecs that the sender and receiver have in common. // 3. 
Within each common codec, determine the common formats, header 
extensions and rtcpFeedback mechanisms. // 4. Determine the payloadType 
to be used, based on the receiver preferredPayloadType. // 5. Set 
RTCRtcpParameters such as mux to their default values. // 6. Return 
RTCRtpParameters enablig the jointly supported features and codecs. } |

Note that while that filling the encoding with the first media supported 
codec is done, it is still needed to process the rtp features (mux, 
feedback and header extensions) in order to create a compatible encoding 
parameters.


    <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#simulcast>Simulcast

 From RFC 7656

|3.6. Simulcast A media source represented as multiple independent 
encoded streams constitutes a simulcast [SDP-SIMULCAST] or Modification 
Detection Code (MDC) of that media source. Figure 8 shows an example of 
a media source that is encoded into three separate simulcast streams, 
that are in turn sent on the same media transport flow. When using 
simulcast, the RTP streams may be sharing an RTP session and media 
transport, or be separated on different RTP sessions and media 
transports, or be any combination of these two. One major reason to use 
separate media transports is to make use of different quality of service 
(QoS) for the different source RTP streams. Some considerations on 
separating related RTP streams are discussed in Section 3.12. 
+----------------+ | Media Source | +----------------+ Source Stream | 
+----------------------+----------------------+ | | | V V V 
+------------------+ +------------------+ +------------------+ | Media 
Encoder | | Media Encoder | | Media Encoder | +------------------+ 
+------------------+ +------------------+ | Encoded | Encoded | Encoded 
| Stream | Stream | Stream V V V +------------------+ 
+------------------+ +------------------+ | Media Packetizer | | Media 
Packetizer | | Media Packetizer | +------------------+ 
+------------------+ +------------------+ | Source | Source | Source | 
RTP | RTP | RTP | Stream | Stream | Stream +-----------------+ | 
+-----------------+ | | | V V V +-------------------+ | Media Transport 
| +-------------------+ Figure 8: Example of Media Source Simulcast The 
simulcast relation between the RTP streams is the common media source. 
In addition, to be able to identify the common media source, a receiver 
of the RTP stream may need to know which configuration or encoding goals 
lay behind the produced encoded stream and its properties. This enables 
selection of the stream that is most useful in the application at that 
moment. |

The main point to take into consideration, is that each layer is 
provided by an independent encoder. So performance wise, it is 
irrelevant if one RTPRtpSender provides two encoding, or two 
RTCRtpSenders provides one encoding each.

So it is possible to cover all the use cases provided by the current 
spec, for example:

|RTCRtpSender (track0) | +-----endoding[0] = {ssrc1,vp8,pt=96} 
+-----endoding[1] = {ssrc1,vp8,pt=97} +-----endoding[2] = 
{ssrc2,vp8,pt=98} |

Will be equivalent to two streams attached to same media track, each one 
with the encodings for a single ssrc.

|RTCRtpSender (track0,ssrc1) | +-----endoding[0] = {vp8,pt=96} 
+-----endoding[1] = {vp8,pt=97} RTCRtpSender (track0,ssrc2) | 
+-----endoding[0] = {vp8,pt=98} |

Note that in first case, the payloads even if on different ssrcs, were 
required to have different payload types.


    <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#svc>SVC

Also from RFC 7656

|3.7. Layered Multi-Stream Layered Multi-Stream (LMS) is a mechanism by 
which different portions of a layered or scalable encoding of a source 
stream are sent using separate RTP streams (sometimes in separate RTP 
sessions). LMSs are useful for receiver control of layered media. A 
media source represented as an encoded stream and multiple dependent 
streams constitutes a media source that has layered dependencies. Figure 
9 represents an example of a media source that is encoded into three 
dependent layers, where two layers are sent on the same media transport 
using different RTP streams, i.e., SSRCs, and the third layer is sent on 
a separate media transport. +----------------+ | Media Source | 
+----------------+ | | V 
+---------------------------------------------------------+ | Media 
Encoder | +---------------------------------------------------------+ | 
| | Encoded Stream Dependent Stream Dependent Stream | | | V V V 
+----------------+ +----------------+ +----------------+ |Media 
Packetizer| |Media Packetizer| |Media Packetizer| +----------------+ 
+----------------+ +----------------+ | | | RTP Stream RTP Stream RTP 
Stream | | | +------+ +------+ | | | | V V V +-----------------+ 
+-----------------+ | Media Transport | | Media Transport | 
+-----------------+ +-----------------+ Figure 9: Example of Media 
Source Layered Dependency It is sometimes useful to make a distinction 
between using a single media transport or multiple separate media 
transports when (in both cases) using multiple RTP streams to carry 
encoded streams and dependent streams for a media source. Therefore, the 
following new terminology is defined here: SRST: Single RTP stream on a 
Single media Transport MRST: Multiple RTP streams on a Single media 
Transport MRMT: Multiple RTP streams on Multiple media Transports MRST 
and MRMT relations need to identify the common media encoder origin for 
the encoded and dependent streams. When using different RTP sessions 
(MRMT), a single RTP stream per media encoder, and a single media source 
in each RTP session, common SSRCs and CNAMEs can be used to identify the 
common media source. When multiple RTP streams are sent from one media 
encoder in the same RTP session (MRST), then CNAME is the only currently 
specified RTP identifier that can be used. In cases where multiple media 
encoders use multiple media sources sharing synchronization context, and 
thus have a common CNAME, additional heuristics or identification need 
to be applied to create the MRST or MRMT relationships between the RTP 
streams. |

The main advantage with simulcast is that here a single instance of the 
encoder is able to serve multiple layers, improving performance compared 
to having several independent encoders.

This is supported in current spec by using the dependencyEncodingIds 
which allows the browser to correlate SVC layers so they can be provided 
by the same encoder:

    dependencyEncodingIds of type sequence The encodingIds on which this
    layer depends. Within this specification encodingIds are permitted
    only within the same RTCRtpEncodingParameters sequence. In the
    future if MST were to be supported, then if searching within an
    RTCRtpEncodingParameters sequence did not produce a match, then a
    global search would be carried out.

Note that currently MST is not supported because the dependency search 
is only done inside of the encoders of an RTCRtpSender, and as 
RTCRtpSender is attached to a single transport, it is not possible to 
send a layer to different transports.

So in current version of ORTC spec, SRST and MRST are supported, but not 
MRMT. In new version, only SRST would be supported.

This limitation is artificial, as if the encodingId were globally 
unique, that search could be done across RTCRtpSender. That would mean 
that SRST, MRST*and MRMT*would be supported with this proposal.

|RTCRtpSender (track0) | +-----endoding[0] = 
{ssrc1,vp9,pt=96,encodingId="track0-0"} +-----endoding[1] = 
{ssrc1,vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]} 
+-----endoding[2] = 
{ssrc2,vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]} |

Will be equivalent to two streams attached to same media track, each one 
with the encodings for a single ssrc.

|RTCRtpSender (track0,ssrc1) | +-----endoding[0] = 
{vp9,pt=96,encodingId="track0-0"} +-----endoding[1] = 
{vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]} 
RTCRtpSender (track0,ssrc2) | +-----endoding[0] = 
{vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}|

Received on Monday, 4 April 2016 15:27:02 UTC