Re: A Big Proposal: A way to control quality/resolution/framreate/simulcast/layering with RtpSender from Martin Thomson on 2014-02-19 (public-orca@w3.org from February 2014)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Tue, 18 Feb 2014 16:56:53 -0800
To: Peter Thatcher <pthatcher@google.com>
Cc: "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <CABkgnnWFc2z-WNcyp39pAUWgZYMCaok6XfVwNy0=8cn9161CfQ@mail.gmail.com>
I read this a few days ago and it didn't seem to bad.  But with time I
start to wonder how this all interacts.

Basically, you have a multi-variable optimization problem that you are
throwing some constraints at.  That's a reasonable way to approach the
problem, but you have to be a lot more precise about what those
constraints actually do if I'm to make sense of this.  I think that
it's close though.

Let's start with the easy one:  Priority.

There are two aspects to priority that this might mean, and I think
that you are using one, but I'm not sure.

In the one case, you use priority to select A over B where there is
not sufficient resources to do both A and B.  That's what I'm going to
call priority ordering.

In the other case, you use priority to proportionally allocate
resources to A and B based on relative importance or weighting.
That's what I've been calling priority weighting.

I think that you might be implying weighting by the numbers that you have here.

Say that we take priority to be a more-or-less direct control over the
amount of bandwidth that is allocated to a given stream/channel.  I
think that's a reasonable starting point in all of this.  That leaves
aside any DSCP markings, which might have secondary effects.

Scheme 1:  With weighting of 0.1 for audio and 10 for video that means
for 101 units of available bandwidth, you allocate 1 unit to audio for
every 100 units of video.

Scheme 2:  Allocate a guaranteed lower bound to bandwidth availability
for streams.  In the above, you might say that audio gets 10kbps and
video gets 100kbps.  The weightings determine how the spare bandwidth
is allocated.  If you have 211kbps and the above weightings, that
means audio gets 11kbps and video gets 200kbps.

Scheme 3:  Some more advanced version of the same where different
types of media have different elasticity curves and final values are
determined by feeding input parameters into a more complex algorithm
(least-squares anyone?).

The net effect is that each stream then gets a bandwidth budget that
it has to fit within [1].  Then it comes down to how to best use all
of that available bandwidth.  That's where the other parameters come
in.

I don't think that this is a straight line between resolution and
framerate.  There are potentially quality changes to make as well.  I
jest about least-squares, but something similar might be used if each
of the axes could be expressed as { min: X, preferred: Y, max: Z }.  I
know people have expressed a desire for a similar sort of input to gUM
constraints.  I do worry about minimums though, is it such a bad thing
if one frame drops below the minimum?  Will anyone notice?

[1] Yes, video doesn't smoothly scale all the way, but maybe that just
means you save a few bytes, or maybe you can return those bits to the
common pool for scavengers (like lower priority data, for example).

On 14 February 2014 10:44, Peter Thatcher <pthatcher@google.com> wrote:
> From what I have seen so far, the hardest part of good RTC API to get right
> is how to control what quality/resolution/framerate/simulcast/layering to
> send.  There are lots of use cases to cover, things interact in complex
> ways, and there is never ending list of edge cases.  I've seen lots of
> failed attempts at solving this.
>
> However, after spending a long time thinking about it and talking to lots of
> people smarter than me, I think we have something that covers all the major
> use cases while being "as simple as possible, but no simpler", which I wish
> to propose now.
>
> This is going to end up being a long email, and will probably lead to a very
> long discussion.  But hopefully it will lead to a good end and give control
> to applications that they've never had before and have been asking for for a
> long time.  I look forward to discussion because this is by no means the
> final word.  It's just the best we've been able to come up with so far.
>
> So.... here we go....
>
>
> What we have so far in the API (proposed, at least):
>
> partial interface RTCRtpReceiver {
>   Promise<void> receive(RTCRtpParameters parameters);
> }
>
> partial interface RTCRtpSender {
>   Promise<void> send(RTCRtpParameters parameters);
> }
>
> dictionary RTCRtpParameters {
>   DOMString?                                receiverId;
>   sequence<RTCRtpCodecParameters>           codecs;
>   sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
>   sequence<RTCRtpEncodingParameters>        encodings;
> }
>
> dictionary RTCRtpEncodingParameters {
>   unsigned int ssrc;
>   DOMString? codec;
>   RTCRtpFecParameters? fec;
>   RTCRtpRtxParameters? rtx;
>
>   // MISSING STUFF HERE
> }
>
>
> What we lack:
>
> - A way to control per-encoding:
>   - Resolution
>   - Framerate
>   - Bitrate
>   - Quality
>   - On/Off
> - A way to control inter-encoding priority
> - A way to control things as more or less bandwidth is available
> - A way to control things as the input resolution/aspect ratio changes
> (rotation)
> - A way to express inter-layer dependencies.
>
>
> General Proposed Solution:
>
> 1. Provide control of per-encoding scale, bitrate, quality, priority, and
> active/inactive.
> 2. Provide control of what to bias toward as more bandwidth is available
> (more resolution or more framerate).
> 3. Provide a way to specify limited inter-layer dependencies (more complex
> relationships are TBD).
> 4. Let input resolutions and framerates be controlled by the input
> MediaStreamTrack, not by the encoding.
> 5. Let the JS change what is currently being sent on the fly based on
> feedback on what is currently being sent (feedback mechanism is TBD).
>
>
> Specfics of RTCRtpEncodingParameters:
>
> dictionary RTCRtpEncodingParameters {
>   // Existing Fields
>   unsigned int ssrc;
>   DOMString? codec;
>   RTCRtpFecParameters? fec;
>   RTCRtpRtxParameters? rtx;
>
>   // New Fields
>   // The higher the value, the more the bits will be given to each
>   // as available bandwidth goes up.  Default is 1.0.
>   double priority;
>
>   // Do this scale of the input resolution, or die trying.
>   // 1.0 = full resolution.  Default is unconstrained.
>   double scale;
>
>   // Ramp up resolution/quality/framerate until this bitrate.
>   // Summed when using dependent layers.
>   double maxBitrate;
>
>   // Ramp up resolution/quality/framerate until this quality.
>   double maxQuality;
>
>   // Never send less than this quality.
>   double minQuality;
>
>   // What to give more bits to, if available.
>   // Perhaps make it an enum.
>   DOMString bias; // "resolution" or "framerate"
>
>   // If false, don't send any media right now.
>   // Disable is different than omitting the encoding; it can keep
>   // resources available to re-enable more quickly than re-adding.
>   // Plus, it still sends RTCP.
>   // Default is active.
>   bool active;
>
>   // These are to setup layer dependencies.
>   int layerId;
>   sequence<int> layerDependencies;  // Just the IDs
> }
>
> Examples:
>
> // Normal 1:1 video with resolution feedback from the receiver
> var encodings = [
>   ssrc: 1,
>   scale: .5
> }];
>
> // Crank up the quality to "11"
> var encodings = [
>   ssrc: 1,
>   maxQuality: 11.0  // TODO: Figure out the scale.
> }];
>
> // Send a thumbnail along with regular size
> var encodings1 = [
>   ssrc: 1,
>   priority: 1.0
> }]
> // Control the resolution and framerate
> // with a different track and RtpSender.
> var encodings2 = [{
>   ssrc: 2,
>   // Prioritize the thumbnail over the main video.
>   priority: 10.0
> }];
>
> // Sign Language
> // (need high framerate, but don't get too bad of quality)
> var encodings = [{
>   minQuality: 0.2,
>   bias: "framerate"
> }];
>
> // SVC which handles camera rotation
> var encodings =[{
>   layerId: 0,
>   scale: 0.25,
>   priority: 3.0
> }, {
>   layerId: 1,
>   layerDependencies: [0]
>   scale: 0.5,
>   priority: 2.0
> }, {
>   layerId: 2,
>   layerDependencies: [0, 1]
>   scale: 1.0,
>   priority: 1.0
> }]
>
> // SVC w/thumbnail:
> var encodings1 =[{
>   layerId: 0,
>   scale: 0.25,
>   priority: 3.0
> }, {
>   layerId: 1,
>   layerDependencies: [0],
>   scale: 0.5,
>   priority: 2.0
> }, {
>   layerId: 2,
>   layerDependencies: [0, 1],
>   scale: 1.0,
>   priority: 1.0
> }];
> // Control the resolution and framerate with a different track and
> RtpSender.
> var encodings2 =[{
>   layerId: 3,
>   priority: 10.0
> }]
>
> // SVC w/thumbnail temporarily disabled:
> var encodings1 =[{
>   layerId: 0,
>   scale: 0.25,
>   priority: 3.0
> }, {
>   layerId: 1,
>   layerDependencies: [0],
>   scale: 0.5,
>   priority: 2.0
> }, {
>   layerId: 2,
>   layerDependencies: [0, 1],
>   scale: 1.0,
>   priority: 1.0
> }];
> // Control the resolution and framerate
> // with a different track and RtpSender.
> var encodings2 =[{
>   layerId: 3,
>   priority: 10.0,
>   active: false
> }]
>
> // Must send a very fixed resolution
> // Adjust the resolution using the input track.
> var encodings = [{
>   scale: 1.0
> }];
>
> // Screencast
> var encodings = [{
>   bias: "resolution"
> }];
>
>
> // Remote Desktop
> // (High framerate, must not dowscale)
> var encodings = [{
>   bias: "framerate"
>   scale: 1.0
> }];
>
>
> // Baby Monitor or Security Camera
> // Adjust the framerate using the input track.
> var encodings = [{ssrc: 1}];
>
> // Audio more important than video
> var audioEncodings = [{
>   priority: 10.0
> }];
> var videoEncodings = [{
>   priority: 0.1
> }];
>
> Video more important than audio
> var audioEncodings = [{
>   priority: 0.1
> }];
> var videoEncodings = [{
>   priority: 10.0
> }];
>
> // Camera Rotation
> // Since there is only control of scale, there is no issue with camera //
> rotation or cropping.  Everything should work fine with no jank.
> var encodings = [{ssrc: 1}];
>
>
> That's it.  I apologize for the typos that I'm sure I missed in such a long
> email.  I look forward to the discussion :).
>
>
>
Received on Wednesday, 19 February 2014 00:57:23 UTC