Some thoughts on RTCRtpParameters, RTCRtpEncodingParameters, RTCRtpCodecParameters, RTCRtpCodec, and RTCRtpCapabilities from Bernard Aboba on 2014-02-26 (public-orca@w3.org from February 2014)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Wed, 26 Feb 2014 00:15:32 +0000
To: "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <55aadf03aada4227a0c87c04537ca1cb@SN2PR03MB031.namprd03.prod.outlook.com>
Looking at the proposals made relating to capabilities and parameters, some thoughts have come to mind on how they might be stitched together.

This post refers to the following previous proposals:

A Big Proposal: http://lists.w3.org/Archives/Public/public-orca/2014Feb/0036.html
Proposal: A more full RTCRtpParameters (RTCRtpCodecParams, RTX, FEC, SSRCs, and the beginnings of simulcast):
 http://lists.w3.org/Archives/Public/public-orca/2014Feb/0021.html
Proposal RtpListener for unsignalled ssrcs:  http://lists.w3.org/Archives/Public/public-orca/2014Feb/0022.html


CAPABILITIES VERSUS CONFIGURATION
--------------------------------------------------------

One question relates to what capabilities are codec-specific versus general capabilities of the underlying RTP stack.

Looking at RFC 5104 Section 7.1, RTCP Feedback message support is negotiated for a given payload type, suggesting that that a given codec implementation may have the capability to support a given set of feedback messages, and also can be configured to send and/or receive them.

Somewhat less clearly, looking at draft-even-mmusic-application-token it would appear that an AppId might be configured to be used with a given codec, or even a layer (simulcast or layered coding) within a codec.  Therefore it might be useful not only to discover if an implementation supports appId, but also to be able to configure its use within a codec or a layer of a codec.

Another question relates to how to express layer dependencies.

Looking at RFC 5583 Section 6.5 as well as RFC 6190 Section 7.3 examples, it appears that dependencies can exist not only within a single RtpReceiver and RtpSender object, but between objects.  As an example, in Multi-Session-Transport as shown in RFC 6190 Section 7.3.3 a dependency could exist between two RtpReceiver objects.

As a result, it would appear that the layerId might be better described in the form of a DOMString than an int, so that it could be referenced across objects in layerDependencies.

In addition, it would appear to me that the recv-appId would be useful to include within the RTCRtpEncodingParameters along with the (nullable) SSRC.  IMHO, it is desirable to be able to avoid triggering an RTCRtpUnhandledRtpEvent if possible.  One circumstance where this should not be needed is when configuring an audio codec where only a single stream is expected (e.g. from a mixer).  In that circumstance, it could be desirable to configure only the Payload Type, leaving out the SSRC.  That way, the RtpReceiver object would handle the incoming RTP stream from the start, regardless of what SSRC was selected, without the need for an RTCRtpUnhandledRtpEvent.

In a video scenario such as reception of simulcast, leaving the SSRC out might also be useful, since the desire might be to allow the mixer to switch between resolutions without signaling.  In such a situation, the recv-appId might be configured on the RtpReceiver object to identify the potential simulcast streams that could be received (only one at a time), but there would be no need to configure an SSRC (this would be configured automatically in the RtpReceiver object via dynamic binding of the recv-AppId to the SSRC,

Also, in thinking about the format parameters that could be discovered or configured on a codec or a layer, it seems like we might be able to simplify things by thinking of them as properties with potential meta-data.   As an example, a given H.264/AVC implementation might support the profile-level-id format parameter.  However, beyond discovering that the parameter is supported, it would be useful to understand what values are supported within the implementation.  So there is some additional information (e.g. metadata)
that it would be useful to have associated with a given format parameter.  In the proposal below, these are called "properties".

Finally, in thinking of the various kinds of RTP capabilities which might be discovered (e.g. header extensions, feedback messages, RTCP report types, etc.) it seems like it be useful to use a more generic mechanism, rather than creating separate buckets for each type of RTP capability.

Below are some ideas of how things might stitch together.

--------------------------
DISCOVERY
--------------------------

//The RTCRtpCapabilities object enables discovery of the supported audio and video codecs as well as codec-specific features, along with generic features of the RTP stack, including features that can be supported within layers (such as layer-specific FEC/RTX).

dictionary RTCRtpCapabilities {
    sequence<RTCRtpCodec>    audioCodecs;
    sequence<RTCRtpCodec>    videoCodecs;
    sequence<Property>                rtpFeatures;            // header extension, rtp features, rtcp reporting + feedback mechanisms supported by engine
    sequence<Property>                extendedEncodingParamaters;            // properties supported in layers
};


--------------------------------
Note:  Because rtpFeatures encompasses header extensions, rtp features, rtcp reporting, feedback mechanisms, etc. it doesn't look like we need RTCRtpFeatures any longer, previously defined as:

enum RTCRtpFeatures {
    "nack"
};


--------------------------
SENDER / RECEIVER
--------------------------

//The RTCRtpParameters object describes the codecs (and codec-specific features), RTP features and encodings utilized by a given RtpReceiver or RtpSender object.

dictionary RTCRtpParameters {
    sequence<RTCRtpCodecParameters>         codecs;
    sequence<RTCRtpEncodingParameters>     encodings;
    sequence<Property>                                rtpFeatures;   // applied to entire sender/receiver object, e.g. header extensions, rtcp reporting + feedback mechanisms
};

//The RTCRtpEncodingParameters object

dictionary RTCRtpEncodingParameters {
   bool                         active;

   DOMString?             encodingId;                           // could have the same uniqueEncodingId for two objects if they are referenced in an "or" dependency as defined in RFC 5583 Section 6.5a

   sequence< DOMString >? dependencyEncodingIds;  // Just the IDs (resolve to encoding Ids within the same sequence first, then search globally for matches)

   DOMString?          receiverId;   // the application ID configured for use.  In an RtpReceiver object, this corresponds to recv-appId;  in an RTPSender object, it corresponds to the appId from draft-even-mmusic-application-token.

// Note that an SSRC can change due to SSRC conflict but the recv-AppId will remain constant, avoiding a potential hickup.
// Also note that the sender can stop sending the appId once it knows that the recv-AppId/SSRC binding has been achieved (e.g. when an RTCP RR is received for the SSRC).

// Note that the SSRC is nullable to allow configuration of an RtpReceiver object that could receive all SSRCs for a given Payload Type or would only configure the recv-appId.

   unsigned int?        ssrc;

  DOMString?           codecName;

  double priority;
  double maxBitrate;
  double maxQuality;
  double minQuality;

  double scale;

  double bias; // 0.0 = strongly favor "resolution" or 1.0 = strongly favor "framerate", 0.5 = neither

  sequence<Property> extendedParameters;     // properties applied to the layer (e.g. fec, rtx, header extensions specific for the layer, rtcp report + feedback mechanisms for the layer)
};

//The RTCRtpCodecParameters object binds a codec to a PayloadType.

dictionary RTCRtpCodecParameters {
    unsigned byte                  payloadType;
    RTCRtpCodec                 codec;
};

RTCRtpCodec
{
    string name;
    unsigned int?       clockRate;
    unsigned int?       numChannels;
    sequence<Property>? formats;                      // properties specific to the codec
    sequence<Property>? rtpFeatures;               // codec specific rtp features, example, header extensions for the codec, rtp/rtcp report + feedback mechanisms specific to the codec
}


//A property can be something that would be signaled, or just something that is configured but not signaled.
//In addition to the property name and a value (of any type), there can be metaData associated with the property.


Property
{
    bool isSignaled;
    DOMString name;
    any? value;
    PropertyMetaData? metaData;
}

PropertyMetaData
{
    sequence<Attributes> attributes;
}

Attribute
{
bool attributeSignaled;
string namespace;
any? value;
}


--------------------
SDP Snippets
--------------------

>From RFC 5104 Section 7.1

         v=0
         o=alice 3203093520 3203093520 IN IP4 host.example.com
         s=Media with feedback
         t=0 0
         c=IN IP4 host.example.com
         m=audio 49170 RTP/AVPF 98
         a=rtpmap:98 H263-1998/90000
         a=rtcp-fb:98 nack pli

Here nack and pli are configured for use with H.263-1998 (but would not necessarily be used with another codec).

>From draft-even-mmusic-application-token Section 3.1:

      a=group:BUNDLE m1 m2
      m=video 49200 RTP/AVP 97,98
      a=rtpmap:98 H264/90000
      a=mid:m1
      a=content:main
      a=rtpmap:97 rtx/90000
      a=fmtp:97 apt=98;rtx-time=3000
      a=appId:2
      a=appId:3
      m=video 49200 RTP/AVP 97,98
      a=rtpmap:98 H264/90000
      a=mid:m2
      a=content:alt
      a=rtpmap:97 rtx/90000
      a=fmtp:97 apt=98;rtx-time=3000
      a=appId:4
      a=appId:5

In the above snippet, it would appear that appIds are being configured for use with different payload types, however, it is not clear what payload types each is being used with.
Received on Wednesday, 26 February 2014 00:16:04 UTC