RE: RtpSender/RtpReceiver and simulcast (plus temporal scaling) from Bernard Aboba on 2014-01-10 (public-orca@w3.org from January 2014)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Fri, 10 Jan 2014 20:54:05 +0000
To: Peter Thatcher <pthatcher@google.com>
CC: "public-orca@w3c.org" <public-orca@w3c.org>
Message-ID: <5129befef147441d80b113b1cff27a9a@SN2PR03MB031.namprd03.prod.outlook.com>

Peter said: 

"Here's a very simple version of what it could look like:

dictionary RTCRtpEncodingParameters {
  unsigned int ssrc;

  // For simulcast, have different encodings with different resolutions.
  int width;
  int height;
  double framerate;
}

With an example like so:

rtpSender.setParameters({
  codecs: [...],
  encodings: [{
      width: 1280, height: 720, framerate: 30
    }, {
      width: 640, height: 360, framerate: 30
    }, {
      width: 320, height: 180, framerate: 15
    }, {
    }
  ]
}"

[BA] Providing this kind of explicit control over the encoding parameters has a natural appeal, since it would allow the application to explicitly lay out the characteristics of each of the simulcast streams.

As noted in a recent blog (see: http://webrtchacks.com/how-to-figure-out-webrtc-camera-resolutions/), implementations very often support fixed resolutions, with a small set of aspect ratios.

So I'm wondering if you're also thinking about allowing explicit resolutions  in capabilities as well?   In the past things like this raised concerns about fingerprinting.

Another question relates to how we would combine spatial simulcast with temporal scaling (a popular combination). 

As an example, let's assume we want the browser to send two resolutions, [640, 360] and [320, 180].

However, we also want each of the simulcast streams to have two temporal layers of 15 fps each for a total of 30 fps.  For simplicity, let's assume that this is SST.

How would we express the combination of simulcast and temporal encoding in "native" mode?                                                      

For temporal scaling, most codecs have a "temporal layerid" (TID) field, which typically starts with 0; higher layers are dependent on lower layers.  Therefore the relationship between layers can be described with a single layerid parameter. 

Would you envisage adding a layerid to RTCRtpEncodingParameters?  By doing so, would a sequence of RTCRtpEncodingParameters be needed to express one of the simulcast streams?

For example, might one of the simulcast streams look like this:

[{layerid: 0, width: 640, height: 360, framerate: 15},
 {layerid: 1, width: 640, height: 360, framerate: 15}]

and the other one would look like this:

[{layerid: 0, width: 320, height: 180, framerate: 15},
 {layerid: 1, width: 320, height: 180, framerate: 15}]

Received on Friday, 10 January 2014 20:54:53 UTC