Re: Proposal: One media RTP packet stream (i.e. one SSRC) per RTCRtpSender/RTCRtpReceiver object from Sergio Garcia Murillo on 2016-04-07 (public-ortc@w3.org from April 2016)

From: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Date: Thu, 7 Apr 2016 17:17:32 +0200
To: Peter Thatcher <pthatcher@google.com>
Cc: "public-ortc@w3.org" <public-ortc@w3.org>
Message-ID: <57067A0C.40706@gmail.com>
Hi Peter,

According to RFC 7656:

    An RTP stream is a stream of RTP packets containing media data,
    source or redundant.  The RTP stream is identified by an SSRC
    belonging to a particular RTP Session.


    A media source represented as multiple independent encoded streams
    constitutes a simulcast


    ...When using
    simulcast, the RTP streams may be sharing an RTP session and media
    transport, or be separated on different RTP sessions and media
    transports, or be any combination of these two.


So, correct me if I am wrong, there are several possibilities for 
simulcasting:

 1. One ssrc, several payloads
 2. Several ssrcs, one transport
 3. Several ssrcs, several transport

Indeed with my proposal 2 would require one RTPSender per ssrc, but 1 
will only require one RTPSender.

Also take note that 2, with current propossal can be implemented in two 
ways, one RTPSender per ssrc (same as my proposal) or all ssrcs in same 
RTPSender. In order to implement 3 with current proposal it is only 
possible to do it with several RTPSenders.

BTW, I am not aware referring to "RTP Streams" (I found the definition 
quite difficult to follow nowdays with dtls, ice and muxing), it only 
appears in my proposals on the text copy&pasted from RFC 7656 describing 
the simulcast/SVC scenarios. I refer to "rtp packet streams" as a term 
used in the RTP usage draft:

    In modern day networks, however, with the widespread use of network
    address/port translators (NAT/NAPT) and firewalls, it is desirable to
    reduce the number of transport-layer flows used by RTP applications.
    This can be done by sending all the RTP packet streams in a single
    RTP session, which will comprise a single transport-layer flow (this
    will prevent the use of some quality-of-service mechanisms, as
    discussed inSection 12.1.3 
<https://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-26#section-12.1.3>).  Implementations are therefore also
    REQUIRED to support transport of all RTP packet streams, independent
    of media type, in a single RTP session using a single transport layer
    flow, according to [I-D.ietf-avtcore-multi-media-rtp-session 
<https://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-26#ref-I-D.ietf-avtcore-multi-media-rtp-session>] (this
    is sometimes called SSRC multiplexing).  If multiple types of media
    are to be used in a single RTP session, all participants in that RTP
    session MUST agree to this usage.  In an SDP context,
    [I-D.ietf-mmusic-sdp-bundle-negotiation 
<https://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-26#ref-I-D.ietf-mmusic-sdp-bundle-negotiation>] can be used to signal such a
    bundle of RTP packet streams forming a single RTP session.


Not sure if it is the correct term, but I have not found a better way of 
naming a "serie of packets using same ssrc over same transport"  :)

Best regards
Sergio
On 07/04/2016 16:50, Peter Thatcher wrote:
> Your proposal could be summarized as "Use multiple RtpSender objects 
> to send simulcast", which as soundly rejected by the WebRTC Working 
> Group in the last 6 months.  If ORTC is going to stay aligned with 
> WebRTC, this isn't going to work.
>
> Also, you use the phrase "RTP Stream" a lot when you actually mean 
> either "Encoded Stream" or maybe "Dependent Stream", which are very 
> different.  In your model, an RtpSender may have only one Encoded 
> Stream, but use of RTX and FEC means there may be more than on RTP 
> Stream within an Encoded Stream.
>
> On Mon, Apr 4, 2016 at 8:26 AM, Sergio Garcia Murillo 
> <sergio.garcia.murillo@gmail.com 
> <mailto:sergio.garcia.murillo@gmail.com>> wrote:
>
>     Hello all,
>
>     We have been working on a new proposal to improve both
>     RTCRtpSender and RTCRtpReceiver object in order to make ORTC spec
>     cleaner and simpler.  I was not sure what was the preferred method
>     of doing this proposals (either github issue or mailing list), so
>     I created a gist for it:
>
>     https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7
>
>     Also I am copying the full proposal below:
>
>     Best regards
>     Sergio
>
>
>       Rationale
>
>     IMHO it is quite difficult to understand what are the RTCRtpSender
>     / RTCRtpReceiver.
>
>          1. Overview
>
>         In the figure above, the RTCRtpSender (Section 5) encodes the
>         track provided as input, which is transported over a
>         RTCDtlsTransport
>
>         5.1 Overview
>
>         The RTCRtpSender includes information relating to the RTP sender.
>
>         5.1 Overview
>
>         An RTCRtpSender instance is associated to a sending
>         MediaStreamTrack and provides RTC related methods to it.
>
>     So, the sender, for example, is a generic object that takes a
>     media track, and generates all short of RTP packets that you send
>     to another peer. It can host one or several encoders, send one or
>     more ssrcs, supporting simulcast and svc.
>
>     In that regards, it is quite similar to an RTCPeerconnection, but
>     instead of using a SDP blob, you pass a kind of json-version of an
>     m-line to the and the RTCRtpSender will do it's best to send all
>     you want to send (given that you have correctly discovered all the
>     restrictions to not cause an InvalidParameters exception).
>
>     So instead of matching a RTP-world object, as DTLS and
>     ICETransport does, it is a kind of catch-all black-box objec
>
> The IceTransport does not represent an ICE-world object. There is no 
> "ICE transport" in any ICE RFC.  The closest thing is an "the ICE 
> processing of a single component within an ICE session" or something 
> like that. IceTransport is a useful API abstraction for controlling a 
> part of ICE that isn't explicitly named in the ICE-world.
>
> Similarly, the RtpSender is a useful API abstraction for controlling 
> RTP, even if it doesn't have an exact named thing it represents in the 
> RTP-world.  Actually, in a world where there is no simulcast, the 
> RtpSender does have a very clear thing it represents in RTP taxonomy 
> (https://tools.ietf.org/html/rfc7656): The Media Encoder + Media 
> Packetizer.   It's only when using simulcast that it gets confusing, 
> because then you represent many Media Encoders + Media Packetizers in 
> one RtpSender, which the RTP Taxonomy RFC does not have a name for.
>
> So I would disagree that it's a catch-all black box.  It's a group of 
> Media Encoders and Media Packetizers tied to a single Media Source.  
> It's just below the Media Source in Figure 8 of 
> https://tools.ietf.org/html/rfc7656, but just because RFC 7656 didn't 
> name that point doesn't mean we can't have an object for it.
>
>     The RTCRtpReceiver shares same complexity, supporting receiving
>     multiple ssrcs streams and payload types as they share the same
>     RTCRtpParameters dictionary for setting up sending and receiving.
>     It was recently suggested (and I agree with) that ORTC start off
>     by supporting the WebRTC 1.0 simulcast model, which involves
>     sending multiple streams, but receiving only one.
>
>     That implies that and RTCRtpReceiver will only receive one RTP
>     packet stream (one ssrc) with one or more payolads (OPUS+dtmf for
>     example). With this change, we can narrow up the definition of an
>     RTCRtpSender and describe it as the object that handles the
>     reception of a rtp packet stream. Again, IMHO, that makes much
>     sense and maps concept in the draft-ietf-rtcweb-rtp-usage.
>
> An RtpReceiver must receive more than one RTP Stream (RTX, FEC, etc).  
> What you're really suggesting is that it only receive one Encoded 
> Stream (no simulcast).
>
>     This proposal takes this idea further, and applies the same
>     concept to the RTCRtpSender. Instead of allowing multiple rtp
>     packets stream to be handled by a RTCRtpSender, we only allow one
>     RTCRtpSender to produce a single rtp packet stream (ssrc).
>
> You have to produce more than one RTP Stream (RTX, FEC). Further, when 
> using Dependent Streams (scalable codecs), you'd have more RTP 
> Streams.  And those should all go in one RtpSender.
>
> I think you're proposing that an RtpSender have only one Encoded 
> Stream, which is different.
>
>     Now we have a one to one relationship between an RTCRtpSender, an
>     RTCRtpReceiver and a media RTP packet stream.
>
>     MediaTrack === > RTCRtpSender ========(single rtp packet stream -
>     SSSRC)===> RTCRtpReceiver ===> MediaTrack
>
>     In that regards, following the m-line analogy, it would represent
>     one ssrc-group.
>
>  ssrc-groups?  An ssrc-group could represent a group of simulcast 
> layers, meaning multiple Encoded Streams.  In fact, that's how 
> simulcast is currently implemented in Chrome.
>
> In m-line language, this would be one RID, not an ssrc-group.
>
>     Simulcast and SVC is also supported (check below).
>
>
>       Benefits
>
>       * Improve RTCRtpSender/RTCRtpReceiver definitions
>       * Cleaner and simpler APIs
>       * Make it harder to have parameter inconsistency
>       * Provide a single and straight forward way of using the API.
>         Given a DTLS/ICE/RTP stream architecture there is only a
>         single way of implementing it in ORTC.
>
>
>       Proposal
>
>     In order to be the less disruptive with the changes we have made
>     only the following changes to the current API:
>
>       * The main change is to move the ssrc, fec and rtx definition
>         from the encodings to the rtp parameters
>         
>
>       * Add RTCRtpCodecRTXParameters associated to each
>         RTCRtpCodecParameters to support rtx apt issue (this change
>         can be also be implemented standalone without the rest of the
>         changes)
>       * Removed the codec sequence from the parameters and move to the
>         encoding one. This change could be removed, although we
>         believe it is important in sake of clarity (more of it later)
>
>     //New dictionary
>     dictionary RTCRtpCodecRTXParameters {
>                   payloadtype               payloadType;
>                   unsigned long             rtxtime;
>     };
>
>     dictionary RTCRtpCodecParameters {
>                   DOMString                 name;
>                   payloadtype               payloadType;
>                   unsigned long             clockRate;
>                   unsigned long             maxptime;
>                   unsigned long             ptime;
>                   unsigned long             numChannels;
>                   sequence<RTCRtcpFeedback>  rtcpFeedback;
>                   Dictionary                parameters;
>                   RTCRtpCodecRTXParameters  rtx;// NEW: rtx.payloadType
>     };
>
>     //Not changed, just added here for completeness
>     dictionary RTCRtpRtxParameters {
>                   unsigned long ssrc;
>                   payloadtype   payloadType;
>     };
>
>     //Not changed, just added here for completeness
>     dictionary RTCRtpFecParameters {
>                   unsigned long ssrc;
>                   DOMString     mechanism;
>     };
>
>     dictionary RTCRtpParameters {
>                   DOMString                                 muxId=  "";
>                   unsigned long                             ssrc;//media ssrc - moved from encodings
>                   RTCRtpFecParameters                       fec;//includes fec.ssrc - moved from encodings
>                   RTCRtpRtxParameters                       rtx;//includes rtx.ssrc - from encodings
>                   sequence<RTCRtpHeaderExtensionParameters>  headerExtensions;
>                   sequence<RTCRtpEncodingParameters>         encodings;
>                   RTCRtcpParameters                         rtcp;
>                   RTCDegradationPreference                  degradationPreference=  "balanced";
>                   //Removed codecs sequence
>     };
>
>     dictionary RTCRtpEncodingParameters {
>                   RTCRtpCodecParameters codec;// Moved from parameters
>                   RTCPriorityType       priority;
>                   unsigned long         maxBitrate;
>                   double                minQuality=  0;
>                   double                resolutionScale;
>                   double                framerateScale;
>                   unsigned long         maxFramerate;
>                   boolean               active=  true;
>                   DOMString             encodingId;
>                   sequence<DOMString>    dependencyEncodingIds;
>                   //Removed ssrc fec rtx
>     };
>
>
>       Impact analisis
>
>
>         Normal use case (1 sender, 1 receiver, 1 media codec)
>
>     As we have removed the sequence of RTCRtpCodecParameters from the
>     parameters, it is required to pass that information in the
>     encodings attributes. So the automatic process that is performed
>     internally by the RTCRtpSender in the current version for this
>     case is not possible:
>
>         the browser behaves as though a single encodings[0] entry was
>         provided, with encodings[0].ssrc set to a browser-determined
>         value, encodings[0].active set to "true",
>         encodings[0].codecPayloadType set to codecs[j].payloadType
>         where j is the index of the first codec that is not "cn",
>         "dtmf", "red", "rtx", or a forward error correction codec, and
>         all the other parameters.encodings[0] attributes unset.
>
>     However note that in the specification, all the examples uses the
>     following helper function that perform the required steps:
>
>     |RTCRtpParameters function myCapsToSendParams(RTCRtpCapabilities
>     sendCaps, RTCRtpCapabili ties remoteRecvCaps) { // Function
>     returning the sender RTCRtpParameters, based on the local sender
>     and remote receiver capabilities. // The goal is to enable a
>     single stream audio and video call with minimum fuss. // // Steps
>     to be followed: // 1. Determine the RTP features that the receiver
>     and sender have in common. // 2. Determine the codecs that the
>     sender and receiver have in common. // 3. Within each common
>     codec, determine the common formats, header extensions and
>     rtcpFeedback mechanisms. // 4. Determine the payloadType to be
>     used, based on the receiver preferredPayloadType. // 5. Set
>     RTCRtcpParameters such as mux to their default values. // 6.
>     Return RTCRtpParameters enablig the jointly supported features and
>     codecs. } |
>
>     Note that while that filling the encoding with the first media
>     supported codec is done, it is still needed to process the rtp
>     features (mux, feedback and header extensions) in order to create
>     a compatible encoding parameters.
>
>
>         Simulcast
>
>     From RFC 7656
>
>     |3.6. Simulcast A media source represented as multiple independent
>     encoded streams constitutes a simulcast [SDP-SIMULCAST] or
>     Modification Detection Code (MDC) of that media source. Figure 8
>     shows an example of a media source that is encoded into three
>     separate simulcast streams, that are in turn sent on the same
>     media transport flow. When using simulcast, the RTP streams may be
>     sharing an RTP session and media transport, or be separated on
>     different RTP sessions and media transports, or be any combination
>     of these two. One major reason to use separate media transports is
>     to make use of different quality of service (QoS) for the
>     different source RTP streams. Some considerations on separating
>     related RTP streams are discussed in Section 3.12.
>     +----------------+ | Media Source | +----------------+ Source
>     Stream | +----------------------+----------------------+ | | | V V
>     V +------------------+ +------------------+ +------------------+ |
>     Media Encoder | | Media Encoder | | Media Encoder |
>     +------------------+ +------------------+ +------------------+ |
>     Encoded | Encoded | Encoded | Stream | Stream | Stream V V V
>     +------------------+ +------------------+ +------------------+ |
>     Media Packetizer | | Media Packetizer | | Media Packetizer |
>     +------------------+ +------------------+ +------------------+ |
>     Source | Source | Source | RTP | RTP | RTP | Stream | Stream |
>     Stream +-----------------+ | +-----------------+ | | | V V V
>     +-------------------+ | Media Transport | +-------------------+
>     Figure 8: Example of Media Source Simulcast The simulcast relation
>     between the RTP streams is the common media source. In addition,
>     to be able to identify the common media source, a receiver of the
>     RTP stream may need to know which configuration or encoding goals
>     lay behind the produced encoded stream and its properties. This
>     enables selection of the stream that is most useful in the
>     application at that moment. |
>
>     The main point to take into consideration, is that each layer is
>     provided by an independent encoder. So performance wise, it is
>     irrelevant if one RTPRtpSender provides two encoding, or two
>     RTCRtpSenders provides one encoding each.
>
>     So it is possible to cover all the use cases provided by the
>     current spec, for example:
>
>     |RTCRtpSender (track0) | +-----endoding[0] = {ssrc1,vp8,pt=96}
>     +-----endoding[1] = {ssrc1,vp8,pt=97} +-----endoding[2] =
>     {ssrc2,vp8,pt=98} |
>
>     Will be equivalent to two streams attached to same media track,
>     each one with the encodings for a single ssrc.
>
>     |RTCRtpSender (track0,ssrc1) | +-----endoding[0] = {vp8,pt=96}
>     +-----endoding[1] = {vp8,pt=97} RTCRtpSender (track0,ssrc2) |
>     +-----endoding[0] = {vp8,pt=98} |
>
>     Note that in first case, the payloads even if on different ssrcs,
>     were required to have different payload types.
>
>
>         SVC
>
>     Also from RFC 7656
>
>     |3.7. Layered Multi-Stream Layered Multi-Stream (LMS) is a
>     mechanism by which different portions of a layered or scalable
>     encoding of a source stream are sent using separate RTP streams
>     (sometimes in separate RTP sessions). LMSs are useful for receiver
>     control of layered media. A media source represented as an encoded
>     stream and multiple dependent streams constitutes a media source
>     that has layered dependencies. Figure 9 represents an example of a
>     media source that is encoded into three dependent layers, where
>     two layers are sent on the same media transport using different
>     RTP streams, i.e., SSRCs, and the third layer is sent on a
>     separate media transport. +----------------+ | Media Source |
>     +----------------+ | | V
>     +---------------------------------------------------------+ |
>     Media Encoder |
>     +---------------------------------------------------------+ | | |
>     Encoded Stream Dependent Stream Dependent Stream | | | V V V
>     +----------------+ +----------------+ +----------------+ |Media
>     Packetizer| |Media Packetizer| |Media Packetizer|
>     +----------------+ +----------------+ +----------------+ | | | RTP
>     Stream RTP Stream RTP Stream | | | +------+ +------+ | | | | V V V
>     +-----------------+ +-----------------+ | Media Transport | |
>     Media Transport | +-----------------+ +-----------------+ Figure
>     9: Example of Media Source Layered Dependency It is sometimes
>     useful to make a distinction between using a single media
>     transport or multiple separate media transports when (in both
>     cases) using multiple RTP streams to carry encoded streams and
>     dependent streams for a media source. Therefore, the following new
>     terminology is defined here: SRST: Single RTP stream on a Single
>     media Transport MRST: Multiple RTP streams on a Single media
>     Transport MRMT: Multiple RTP streams on Multiple media Transports
>     MRST and MRMT relations need to identify the common media encoder
>     origin for the encoded and dependent streams. When using different
>     RTP sessions (MRMT), a single RTP stream per media encoder, and a
>     single media source in each RTP session, common SSRCs and CNAMEs
>     can be used to identify the common media source. When multiple RTP
>     streams are sent from one media encoder in the same RTP session
>     (MRST), then CNAME is the only currently specified RTP identifier
>     that can be used. In cases where multiple media encoders use
>     multiple media sources sharing synchronization context, and thus
>     have a common CNAME, additional heuristics or identification need
>     to be applied to create the MRST or MRMT relationships between the
>     RTP streams. |
>
>     The main advantage with simulcast is that here a single instance
>     of the encoder is able to serve multiple layers, improving
>     performance compared to having several independent encoders.
>
>     This is supported in current spec by using the
>     dependencyEncodingIds which allows the browser to correlate SVC
>     layers so they can be provided by the same encoder:
>
>         dependencyEncodingIds of type sequence The encodingIds on
>         which this layer depends. Within this specification
>         encodingIds are permitted only within the same
>         RTCRtpEncodingParameters sequence. In the future if MST were
>         to be supported, then if searching within an
>         RTCRtpEncodingParameters sequence did not produce a match,
>         then a global search would be carried out.
>
>     Note that currently MST is not supported because the dependency
>     search is only done inside of the encoders of an RTCRtpSender, and
>     as RTCRtpSender is attached to a single transport, it is not
>     possible to send a layer to different transports.
>
>     So in current version of ORTC spec, SRST and MRST are supported,
>     but not MRMT. In new version, only SRST would be supported.
>
>     This limitation is artificial, as if the encodingId were globally
>     unique, that search could be done across RTCRtpSender. That would
>     mean that SRST, MRST*and MRMT*would be supported with this proposal.
>
>     |RTCRtpSender (track0) | +-----endoding[0] =
>     {ssrc1,vp9,pt=96,encodingId="track0-0"} +-----endoding[1] =
>     {ssrc1,vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}
>     +-----endoding[2] =
>     {ssrc2,vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}
>     |
>
>     Will be equivalent to two streams attached to same media track,
>     each one with the encodings for a single ssrc.
>
>     |RTCRtpSender (track0,ssrc1) | +-----endoding[0] =
>     {vp9,pt=96,encodingId="track0-0"} +-----endoding[1] =
>     {vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}
>     RTCRtpSender (track0,ssrc2) | +-----endoding[0] =
>     {vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}|
>
>
>
Received on Thursday, 7 April 2016 15:18:03 UTC