Re: Proposal: One media RTP packet stream (i.e. one SSRC) per RTCRtpSender/RTCRtpReceiver object

Your proposal could be summarized as "Use multiple RtpSender objects to
send simulcast", which as soundly rejected by the WebRTC Working Group in
the last 6 months.  If ORTC is going to stay aligned with WebRTC, this
isn't going to work.

Also, you use the phrase "RTP Stream" a lot when you actually mean either
"Encoded Stream" or maybe "Dependent Stream", which are very different.  In
your model, an RtpSender may have only one Encoded Stream, but use of RTX
and FEC means there may be more than on RTP Stream within an Encoded Stream.

On Mon, Apr 4, 2016 at 8:26 AM, Sergio Garcia Murillo <
sergio.garcia.murillo@gmail.com> wrote:

> Hello all,
>
> We have been working on a new proposal to improve both RTCRtpSender and
> RTCRtpReceiver object in order to make ORTC spec cleaner and simpler.  I
> was not sure what was the preferred method of doing this proposals (either
> github issue or mailing list), so I created a gist for it:
>
> https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7
>
> Also I am copying the full proposal below:
>
> Best regards
> Sergio
>
> Rationale
>
> IMHO it is quite difficult to understand what are the RTCRtpSender /
> RTCRtpReceiver.
>
>
>    1. Overview
>
> In the figure above, the RTCRtpSender (Section 5) encodes the track
> provided as input, which is transported over a RTCDtlsTransport
>
> 5.1 Overview
>
> The RTCRtpSender includes information relating to the RTP sender.
>
> 5.1 Overview
>
> An RTCRtpSender instance is associated to a sending MediaStreamTrack and
> provides RTC related methods to it.
>
> So, the sender, for example, is a generic object that takes a media track,
> and generates all short of RTP packets that you send to another peer. It
> can host one or several encoders, send one or more ssrcs, supporting
> simulcast and svc.
>
> In that regards, it is quite similar to an RTCPeerconnection, but instead
> of using a SDP blob, you pass a kind of json-version of an m-line to the
> and the RTCRtpSender will do it's best to send all you want to send (given
> that you have correctly discovered all the restrictions to not cause an
> InvalidParameters exception).
>
> So instead of matching a RTP-world object, as DTLS and ICETransport does,
> it is a kind of catch-all black-box objec
>
The IceTransport does not represent an ICE-world object.  There is no "ICE
transport" in any ICE RFC.  The closest thing is an "the ICE processing of
a single component within an ICE session" or something like that.
IceTransport is a useful API abstraction for controlling a part of ICE that
isn't explicitly named in the ICE-world.

Similarly, the RtpSender is a useful API abstraction for controlling RTP,
even if it doesn't have an exact named thing it represents in the
RTP-world.  Actually, in a world where there is no simulcast, the RtpSender
does have a very clear thing it represents in RTP taxonomy (
https://tools.ietf.org/html/rfc7656): The Media Encoder + Media Packetizer.
  It's only when using simulcast that it gets confusing, because then you
represent many Media Encoders + Media Packetizers in one RtpSender, which
the RTP Taxonomy RFC does not have a name for.

So I would disagree that it's a catch-all black box.  It's a group of Media
Encoders and Media Packetizers tied to a single Media Source.  It's just
below the Media Source in Figure 8 of https://tools.ietf.org/html/rfc7656,
but just because RFC 7656 didn't name that point doesn't mean we can't have
an object for it.

> The RTCRtpReceiver shares same complexity, supporting receiving multiple
> ssrcs streams and payload types as they share the same RTCRtpParameters
> dictionary for setting up sending and receiving. It was recently suggested
> (and I agree with) that ORTC start off by supporting the WebRTC 1.0
> simulcast model, which involves sending multiple streams, but receiving
> only one.
>
> That implies that and RTCRtpReceiver will only receive one RTP packet
> stream (one ssrc) with one or more payolads (OPUS+dtmf for example). With
> this change, we can narrow up the definition of an RTCRtpSender and
> describe it as the object that handles the reception of a rtp packet
> stream. Again, IMHO, that makes much sense and maps concept in the
> draft-ietf-rtcweb-rtp-usage.
>
An RtpReceiver must receive more than one RTP Stream (RTX, FEC, etc).  What
you're really suggesting is that it only receive one Encoded Stream (no
simulcast).

> This proposal takes this idea further, and applies the same concept to the
> RTCRtpSender. Instead of allowing multiple rtp packets stream to be handled
> by a RTCRtpSender, we only allow one RTCRtpSender to produce a single rtp
> packet stream (ssrc).
>
You have to produce more than one RTP Stream (RTX, FEC).  Further, when
using Dependent Streams (scalable codecs), you'd have more RTP Streams.
And those should all go in one RtpSender.

I think you're proposing that an RtpSender have only one Encoded Stream,
which is different.​


> Now we have a one to one relationship between an RTCRtpSender, an
> RTCRtpReceiver and a media RTP packet stream.
>
> MediaTrack === > RTCRtpSender ========(single rtp packet stream -
> SSSRC)===> RTCRtpReceiver ===> MediaTrack
>
> In that regards, following the m-line analogy, it would represent one
> ssrc-group.
>
> ​ssrc-groups?  An ssrc-group could represent a group of simulcast layers,
meaning multiple Encoded Streams.  In fact, that's how simulcast is
currently implemented in Chrome.

In m-line language, this would be one RID, not an ssrc-group.

> Simulcast and SVC is also supported (check below).
>
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#benefits>
> Benefits
>
>    - Improve RTCRtpSender/RTCRtpReceiver definitions
>    - Cleaner and simpler APIs
>    - Make it harder to have parameter inconsistency
>    - Provide a single and straight forward way of using the API. Given a
>    DTLS/ICE/RTP stream architecture there is only a single way of implementing
>    it in ORTC.
>
>
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#proposal>
> Proposal
>
> In order to be the less disruptive with the changes we have made only the
> following changes to the current API:
>
>    - The main change is to move the ssrc, fec and rtx definition from the
>    encodings to the rtp parameters
>    ​
>
>
>
>    - Add RTCRtpCodecRTXParameters associated to each
>    RTCRtpCodecParameters to support rtx apt issue (this change can be also be
>    implemented standalone without the rest of the changes)
>    - Removed the codec sequence from the parameters and move to the
>    encoding one. This change could be removed, although we believe it is
>    important in sake of clarity (more of it later)
>
> //New dictionary
> dictionary RTCRtpCodecRTXParameters {
>              payloadtype               payloadType;
>              unsigned long             rtxtime;
> };
>
> dictionary RTCRtpCodecParameters {
>              DOMString                 name;
>              payloadtype               payloadType;
>              unsigned long             clockRate;
>              unsigned long             maxptime;
>              unsigned long             ptime;
>              unsigned long             numChannels;
>              sequence<RTCRtcpFeedback> rtcpFeedback;
>              Dictionary                parameters;
>              RTCRtpCodecRTXParameters  rtx;                       // NEW: rtx.payloadType
> };
> //Not changed, just added here for completeness
> dictionary RTCRtpRtxParameters {
>              unsigned long ssrc;
>              payloadtype   payloadType;
> };
> //Not changed, just added here for completeness
> dictionary RTCRtpFecParameters {
>              unsigned long ssrc;
>              DOMString     mechanism;
> };
>
> dictionary RTCRtpParameters {
>              DOMString                                 muxId = "";
>              unsigned long                             ssrc;        //media ssrc         - moved from encodings
>              RTCRtpFecParameters                       fec;         //includes fec.ssrc  - moved from encodings
>              RTCRtpRtxParameters                       rtx;         //includes rtx.ssrc  - from encodings
>              sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
>              sequence<RTCRtpEncodingParameters>        encodings;
>              RTCRtcpParameters                         rtcp;
>              RTCDegradationPreference                  degradationPreference = "balanced";
>              //Removed codecs sequence
> };
>
> dictionary RTCRtpEncodingParameters {
>              RTCRtpCodecParameters codec;             // Moved from parameters
>              RTCPriorityType       priority;
>              unsigned long         maxBitrate;
>              double                minQuality = 0;
>              double                resolutionScale;
>              double                framerateScale;
>              unsigned long         maxFramerate;
>              boolean               active = true;
>              DOMString             encodingId;
>              sequence<DOMString>   dependencyEncodingIds;
>              //Removed ssrc fec rtx
> };
>
>
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#impact-analisis>Impact
> analisis
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#normal-use-case-1-sender-1-receiver-1-media-codec>Normal
> use case (1 sender, 1 receiver, 1 media codec)
>
> As we have removed the sequence of RTCRtpCodecParameters from the
> parameters, it is required to pass that information in the encodings
> attributes. So the automatic process that is performed internally by the
> RTCRtpSender in the current version for this case is not possible:
>
> the browser behaves as though a single encodings[0] entry was provided,
> with encodings[0].ssrc set to a browser-determined value,
> encodings[0].active set to "true", encodings[0].codecPayloadType set to
> codecs[j].payloadType where j is the index of the first codec that is not
> "cn", "dtmf", "red", "rtx", or a forward error correction codec, and all
> the other parameters.encodings[0] attributes unset.
>
> However note that in the specification, all the examples uses the
> following helper function that perform the required steps:
>
> RTCRtpParameters function myCapsToSendParams(RTCRtpCapabilities sendCaps, RTCRtpCapabili
> ties remoteRecvCaps) {
>   // Function returning the sender RTCRtpParameters, based on the local sender and remote receiver capabilities.
>   // The goal is to enable a single stream audio and video call with minimum fuss.
>   //
>   // Steps to be followed:
>   // 1. Determine the RTP features that the receiver and sender have in common.
>   // 2. Determine the codecs that the sender and receiver have in common.
>   // 3. Within each common codec, determine the common formats, header extensions and rtcpFeedback mechanisms.
>   // 4. Determine the payloadType to be used, based on the receiver preferredPayloadType.
>   // 5. Set RTCRtcpParameters such as mux to their default values.
>   // 6. Return RTCRtpParameters enablig the jointly supported features and codecs.
> }
>
> Note that while that filling the encoding with the first media supported
> codec is done, it is still needed to process the rtp features (mux,
> feedback and header extensions) in order to create a compatible encoding
> parameters.
>
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#simulcast>
> Simulcast
>
> From RFC 7656
>
> 3.6.  Simulcast
>
>    A media source represented as multiple independent encoded streams
>    constitutes a simulcast [SDP-SIMULCAST] or Modification Detection
>    Code (MDC) of that media source.  Figure 8 shows an example of a
>    media source that is encoded into three separate simulcast streams,
>    that are in turn sent on the same media transport flow.  When using
>    simulcast, the RTP streams may be sharing an RTP session and media
>    transport, or be separated on different RTP sessions and media
>    transports, or be any combination of these two.  One major reason to
>    use separate media transports is to make use of different quality of
>    service (QoS) for the different source RTP streams.  Some
>    considerations on separating related RTP streams are discussed in
>    Section 3.12.
>
>                             +----------------+
>                             |  Media Source  |
>                             +----------------+
>                      Source Stream  |
>              +----------------------+----------------------+
>              |                      |                      |
>              V                      V                      V
>     +------------------+   +------------------+   +------------------+
>     |  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
>     +------------------+   +------------------+   +------------------+
>              | Encoded              | Encoded              | Encoded
>              | Stream               | Stream               | Stream
>              V                      V                      V
>     +------------------+   +------------------+   +------------------+
>     | Media Packetizer |   | Media Packetizer |   | Media Packetizer |
>     +------------------+   +------------------+   +------------------+
>              | Source               | Source               | Source
>              | RTP                  | RTP                  | RTP
>              | Stream               | Stream               | Stream
>              +-----------------+    |    +-----------------+
>                                |    |    |
>                                V    V    V
>                           +-------------------+
>                           |  Media Transport  |
>                           +-------------------+
>
>                 Figure 8: Example of Media Source Simulcast
>
>    The simulcast relation between the RTP streams is the common media
>    source.  In addition, to be able to identify the common media source,
>    a receiver of the RTP stream may need to know which configuration or
>    encoding goals lay behind the produced encoded stream and its
>    properties.  This enables selection of the stream that is most useful
>    in the application at that moment.
>
> The main point to take into consideration, is that each layer is provided
> by an independent encoder. So performance wise, it is irrelevant if one
> RTPRtpSender provides two encoding, or two RTCRtpSenders provides one
> encoding each.
>
> So it is possible to cover all the use cases provided by the current spec,
> for example:
>
> RTCRtpSender (track0)
>  |
>  +-----endoding[0] = {ssrc1,vp8,pt=96}
>  +-----endoding[1] = {ssrc1,vp8,pt=97}
>  +-----endoding[2] = {ssrc2,vp8,pt=98}
>
> Will be equivalent to two streams attached to same media track, each one
> with the encodings for a single ssrc.
>
>  RTCRtpSender (track0,ssrc1)
>  |
>  +-----endoding[0] = {vp8,pt=96}
>  +-----endoding[1] = {vp8,pt=97}
>
>  RTCRtpSender (track0,ssrc2)
>  |
>  +-----endoding[0] = {vp8,pt=98}
>
> Note that in first case, the payloads even if on different ssrcs, were
> required to have different payload types.
> <https://gist.github.com/murillo128/d9da72ef76df26d2fde848a265c46fc7#svc>
> SVC
>
> Also from RFC 7656
>
>
> 3.7.  Layered Multi-Stream
>
>    Layered Multi-Stream (LMS) is a mechanism by which different portions
>    of a layered or scalable encoding of a source stream are sent using
>    separate RTP streams (sometimes in separate RTP sessions).  LMSs are
>    useful for receiver control of layered media.
>
>    A media source represented as an encoded stream and multiple
>    dependent streams constitutes a media source that has layered
>    dependencies.  Figure 9 represents an example of a media source that
>    is encoded into three dependent layers, where two layers are sent on
>    the same media transport using different RTP streams, i.e., SSRCs,
>    and the third layer is sent on a separate media transport.
>
>                             +----------------+
>                             |  Media Source  |
>                             +----------------+
>                                     |
>                                     |
>                                     V
>        +---------------------------------------------------------+
>        |                      Media Encoder                      |
>        +---------------------------------------------------------+
>                |                    |                     |
>         Encoded Stream       Dependent Stream     Dependent Stream
>                |                    |                     |
>                V                    V                     V
>        +----------------+   +----------------+   +----------------+
>        |Media Packetizer|   |Media Packetizer|   |Media Packetizer|
>        +----------------+   +----------------+   +----------------+
>                |                    |                     |
>           RTP Stream           RTP Stream            RTP Stream
>                |                    |                     |
>                +------+      +------+                     |
>                       |      |                            |
>                       V      V                            V
>                 +-----------------+              +-----------------+
>                 | Media Transport |              | Media Transport |
>                 +-----------------+              +-----------------+
>
>            Figure 9: Example of Media Source Layered Dependency
>
>    It is sometimes useful to make a distinction between using a single
>    media transport or multiple separate media transports when (in both
>    cases) using multiple RTP streams to carry encoded streams and
>    dependent streams for a media source.  Therefore, the following new
>    terminology is defined here:
>
>    SRST:  Single RTP stream on a Single media Transport
>
>    MRST:  Multiple RTP streams on a Single media Transport
>
>    MRMT:  Multiple RTP streams on Multiple media Transports
>
>    MRST and MRMT relations need to identify the common media encoder
>    origin for the encoded and dependent streams.  When using different
>    RTP sessions (MRMT), a single RTP stream per media encoder, and a
>    single media source in each RTP session, common SSRCs and CNAMEs can
>    be used to identify the common media source.  When multiple RTP
>    streams are sent from one media encoder in the same RTP session
>    (MRST), then CNAME is the only currently specified RTP identifier
>    that can be used.  In cases where multiple media encoders use
>    multiple media sources sharing synchronization context, and thus have
>    a common CNAME, additional heuristics or identification need to be
>    applied to create the MRST or MRMT relationships between the RTP
>    streams.
>
> The main advantage with simulcast is that here a single instance of the
> encoder is able to serve multiple layers, improving performance compared to
> having several independent encoders.
>
> This is supported in current spec by using the dependencyEncodingIds which
> allows the browser to correlate SVC layers so they can be provided by the
> same encoder:
>
> dependencyEncodingIds of type sequence The encodingIds on which this layer
> depends. Within this specification encodingIds are permitted only within
> the same RTCRtpEncodingParameters sequence. In the future if MST were to be
> supported, then if searching within an RTCRtpEncodingParameters sequence
> did not produce a match, then a global search would be carried out.
>
> Note that currently MST is not supported because the dependency search is
> only done inside of the encoders of an RTCRtpSender, and as RTCRtpSender is
> attached to a single transport, it is not possible to send a layer to
> different transports.
>
> So in current version of ORTC spec, SRST and MRST are supported, but not
> MRMT. In new version, only SRST would be supported.
>
> This limitation is artificial, as if the encodingId were globally unique,
> that search could be done across RTCRtpSender. That would mean that SRST,
> MRST *and MRMT* would be supported with this proposal.
>
> RTCRtpSender (track0)
>  |
>  +-----endoding[0] = {ssrc1,vp9,pt=96,encodingId="track0-0"}
>  +-----endoding[1] = {ssrc1,vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}
>  +-----endoding[2] = {ssrc2,vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}
>
> Will be equivalent to two streams attached to same media track, each one
> with the encodings for a single ssrc.
>
>  RTCRtpSender (track0,ssrc1)
>  |
>  +-----endoding[0] = {vp9,pt=96,encodingId="track0-0"}
>  +-----endoding[1] = {vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}
>
>  RTCRtpSender (track0,ssrc2)
>  |
>  +-----endoding[0] = {vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}
>
>
>

Received on Thursday, 7 April 2016 14:51:54 UTC