Re: A Big Proposal: A way to control quality/resolution/framreate/simulcast/layering with RtpSender from Peter Thatcher on 2014-02-19 (public-orca@w3.org from February 2014)

From: Peter Thatcher <pthatcher@google.com>
Date: Tue, 18 Feb 2014 17:17:07 -0800
To: Martin Thomson <martin.thomson@gmail.com>
Cc: "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <CAJrXDUEwYeEMP=zUBh4reNwdkDazwq76nhF-ttpaPq30AHZ5Fw@mail.gmail.com>
There is definitely a balance between "make it simple to implement and
clear what it will do, but impossible to express everything" and "you can
express everything in the world, but who knows how it's going to be
implemented, or what it will actually do".

This proposal leans toward the simple, with a "make it as simple as
possible, but no simpler" approach.  If anything more advanced is desired,
the JS can step in and reconfigure things on the fly.  Maybe even that
won't be enough to cover every use case, but it's a slippery slope when you
start add more and more knobs.

As for priority, your "add bits proportionally" is exactly what I was
thinking, but, as you pointed out, it's "roughly proportionally", since you
can quite split up bit allocation that finely.  Whether a browser does it
in a simple fashion or with more fancy "curves" may simply be browser,
codec, and BWE dependent.  The JS gives the policy, and the browser does
its best.

For changing or adding things in general, I've tried to take a very
use-case based approach.  If you can think of a use case that is not
covered, or find a use case in my 10 or so example that is no covered well,
please point it out.  Because if we cover all the major use cases that we
can think of well, I think that's where we can draw the line at "simple
enough".

DSCP markings may be such a use case.  I don't think we have anything for
that yet.



On Tue, Feb 18, 2014 at 4:56 PM, Martin Thomson <martin.thomson@gmail.com>wrote:

> I read this a few days ago and it didn't seem to bad.  But with time I
> start to wonder how this all interacts.
>
> Basically, you have a multi-variable optimization problem that you are
> throwing some constraints at.  That's a reasonable way to approach the
> problem, but you have to be a lot more precise about what those
> constraints actually do if I'm to make sense of this.  I think that
> it's close though.
>
> Let's start with the easy one:  Priority.
>
> There are two aspects to priority that this might mean, and I think
> that you are using one, but I'm not sure.
>
> In the one case, you use priority to select A over B where there is
> not sufficient resources to do both A and B.  That's what I'm going to
> call priority ordering.
>
> In the other case, you use priority to proportionally allocate
> resources to A and B based on relative importance or weighting.
> That's what I've been calling priority weighting.
>
> I think that you might be implying weighting by the numbers that you have
> here.
>
> Say that we take priority to be a more-or-less direct control over the
> amount of bandwidth that is allocated to a given stream/channel.  I
> think that's a reasonable starting point in all of this.  That leaves
> aside any DSCP markings, which might have secondary effects.
>
> Scheme 1:  With weighting of 0.1 for audio and 10 for video that means
> for 101 units of available bandwidth, you allocate 1 unit to audio for
> every 100 units of video.
>
> Scheme 2:  Allocate a guaranteed lower bound to bandwidth availability
> for streams.  In the above, you might say that audio gets 10kbps and
> video gets 100kbps.  The weightings determine how the spare bandwidth
> is allocated.  If you have 211kbps and the above weightings, that
> means audio gets 11kbps and video gets 200kbps.
>
> Scheme 3:  Some more advanced version of the same where different
> types of media have different elasticity curves and final values are
> determined by feeding input parameters into a more complex algorithm
> (least-squares anyone?).
>
> The net effect is that each stream then gets a bandwidth budget that
> it has to fit within [1].  Then it comes down to how to best use all
> of that available bandwidth.  That's where the other parameters come
> in.
>
> I don't think that this is a straight line between resolution and
> framerate.  There are potentially quality changes to make as well.  I
> jest about least-squares, but something similar might be used if each
> of the axes could be expressed as { min: X, preferred: Y, max: Z }.  I
> know people have expressed a desire for a similar sort of input to gUM
> constraints.  I do worry about minimums though, is it such a bad thing
> if one frame drops below the minimum?  Will anyone notice?
>
> [1] Yes, video doesn't smoothly scale all the way, but maybe that just
> means you save a few bytes, or maybe you can return those bits to the
> common pool for scavengers (like lower priority data, for example).
>
> On 14 February 2014 10:44, Peter Thatcher <pthatcher@google.com> wrote:
> > From what I have seen so far, the hardest part of good RTC API to get
> right
> > is how to control what quality/resolution/framerate/simulcast/layering to
> > send.  There are lots of use cases to cover, things interact in complex
> > ways, and there is never ending list of edge cases.  I've seen lots of
> > failed attempts at solving this.
> >
> > However, after spending a long time thinking about it and talking to
> lots of
> > people smarter than me, I think we have something that covers all the
> major
> > use cases while being "as simple as possible, but no simpler", which I
> wish
> > to propose now.
> >
> > This is going to end up being a long email, and will probably lead to a
> very
> > long discussion.  But hopefully it will lead to a good end and give
> control
> > to applications that they've never had before and have been asking for
> for a
> > long time.  I look forward to discussion because this is by no means the
> > final word.  It's just the best we've been able to come up with so far.
> >
> > So.... here we go....
> >
> >
> > What we have so far in the API (proposed, at least):
> >
> > partial interface RTCRtpReceiver {
> >   Promise<void> receive(RTCRtpParameters parameters);
> > }
> >
> > partial interface RTCRtpSender {
> >   Promise<void> send(RTCRtpParameters parameters);
> > }
> >
> > dictionary RTCRtpParameters {
> >   DOMString?                                receiverId;
> >   sequence<RTCRtpCodecParameters>           codecs;
> >   sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
> >   sequence<RTCRtpEncodingParameters>        encodings;
> > }
> >
> > dictionary RTCRtpEncodingParameters {
> >   unsigned int ssrc;
> >   DOMString? codec;
> >   RTCRtpFecParameters? fec;
> >   RTCRtpRtxParameters? rtx;
> >
> >   // MISSING STUFF HERE
> > }
> >
> >
> > What we lack:
> >
> > - A way to control per-encoding:
> >   - Resolution
> >   - Framerate
> >   - Bitrate
> >   - Quality
> >   - On/Off
> > - A way to control inter-encoding priority
> > - A way to control things as more or less bandwidth is available
> > - A way to control things as the input resolution/aspect ratio changes
> > (rotation)
> > - A way to express inter-layer dependencies.
> >
> >
> > General Proposed Solution:
> >
> > 1. Provide control of per-encoding scale, bitrate, quality, priority, and
> > active/inactive.
> > 2. Provide control of what to bias toward as more bandwidth is available
> > (more resolution or more framerate).
> > 3. Provide a way to specify limited inter-layer dependencies (more
> complex
> > relationships are TBD).
> > 4. Let input resolutions and framerates be controlled by the input
> > MediaStreamTrack, not by the encoding.
> > 5. Let the JS change what is currently being sent on the fly based on
> > feedback on what is currently being sent (feedback mechanism is TBD).
> >
> >
> > Specfics of RTCRtpEncodingParameters:
> >
> > dictionary RTCRtpEncodingParameters {
> >   // Existing Fields
> >   unsigned int ssrc;
> >   DOMString? codec;
> >   RTCRtpFecParameters? fec;
> >   RTCRtpRtxParameters? rtx;
> >
> >   // New Fields
> >   // The higher the value, the more the bits will be given to each
> >   // as available bandwidth goes up.  Default is 1.0.
> >   double priority;
> >
> >   // Do this scale of the input resolution, or die trying.
> >   // 1.0 = full resolution.  Default is unconstrained.
> >   double scale;
> >
> >   // Ramp up resolution/quality/framerate until this bitrate.
> >   // Summed when using dependent layers.
> >   double maxBitrate;
> >
> >   // Ramp up resolution/quality/framerate until this quality.
> >   double maxQuality;
> >
> >   // Never send less than this quality.
> >   double minQuality;
> >
> >   // What to give more bits to, if available.
> >   // Perhaps make it an enum.
> >   DOMString bias; // "resolution" or "framerate"
> >
> >   // If false, don't send any media right now.
> >   // Disable is different than omitting the encoding; it can keep
> >   // resources available to re-enable more quickly than re-adding.
> >   // Plus, it still sends RTCP.
> >   // Default is active.
> >   bool active;
> >
> >   // These are to setup layer dependencies.
> >   int layerId;
> >   sequence<int> layerDependencies;  // Just the IDs
> > }
> >
> > Examples:
> >
> > // Normal 1:1 video with resolution feedback from the receiver
> > var encodings = [
> >   ssrc: 1,
> >   scale: .5
> > }];
> >
> > // Crank up the quality to "11"
> > var encodings = [
> >   ssrc: 1,
> >   maxQuality: 11.0  // TODO: Figure out the scale.
> > }];
> >
> > // Send a thumbnail along with regular size
> > var encodings1 = [
> >   ssrc: 1,
> >   priority: 1.0
> > }]
> > // Control the resolution and framerate
> > // with a different track and RtpSender.
> > var encodings2 = [{
> >   ssrc: 2,
> >   // Prioritize the thumbnail over the main video.
> >   priority: 10.0
> > }];
> >
> > // Sign Language
> > // (need high framerate, but don't get too bad of quality)
> > var encodings = [{
> >   minQuality: 0.2,
> >   bias: "framerate"
> > }];
> >
> > // SVC which handles camera rotation
> > var encodings =[{
> >   layerId: 0,
> >   scale: 0.25,
> >   priority: 3.0
> > }, {
> >   layerId: 1,
> >   layerDependencies: [0]
> >   scale: 0.5,
> >   priority: 2.0
> > }, {
> >   layerId: 2,
> >   layerDependencies: [0, 1]
> >   scale: 1.0,
> >   priority: 1.0
> > }]
> >
> > // SVC w/thumbnail:
> > var encodings1 =[{
> >   layerId: 0,
> >   scale: 0.25,
> >   priority: 3.0
> > }, {
> >   layerId: 1,
> >   layerDependencies: [0],
> >   scale: 0.5,
> >   priority: 2.0
> > }, {
> >   layerId: 2,
> >   layerDependencies: [0, 1],
> >   scale: 1.0,
> >   priority: 1.0
> > }];
> > // Control the resolution and framerate with a different track and
> > RtpSender.
> > var encodings2 =[{
> >   layerId: 3,
> >   priority: 10.0
> > }]
> >
> > // SVC w/thumbnail temporarily disabled:
> > var encodings1 =[{
> >   layerId: 0,
> >   scale: 0.25,
> >   priority: 3.0
> > }, {
> >   layerId: 1,
> >   layerDependencies: [0],
> >   scale: 0.5,
> >   priority: 2.0
> > }, {
> >   layerId: 2,
> >   layerDependencies: [0, 1],
> >   scale: 1.0,
> >   priority: 1.0
> > }];
> > // Control the resolution and framerate
> > // with a different track and RtpSender.
> > var encodings2 =[{
> >   layerId: 3,
> >   priority: 10.0,
> >   active: false
> > }]
> >
> > // Must send a very fixed resolution
> > // Adjust the resolution using the input track.
> > var encodings = [{
> >   scale: 1.0
> > }];
> >
> > // Screencast
> > var encodings = [{
> >   bias: "resolution"
> > }];
> >
> >
> > // Remote Desktop
> > // (High framerate, must not dowscale)
> > var encodings = [{
> >   bias: "framerate"
> >   scale: 1.0
> > }];
> >
> >
> > // Baby Monitor or Security Camera
> > // Adjust the framerate using the input track.
> > var encodings = [{ssrc: 1}];
> >
> > // Audio more important than video
> > var audioEncodings = [{
> >   priority: 10.0
> > }];
> > var videoEncodings = [{
> >   priority: 0.1
> > }];
> >
> > Video more important than audio
> > var audioEncodings = [{
> >   priority: 0.1
> > }];
> > var videoEncodings = [{
> >   priority: 10.0
> > }];
> >
> > // Camera Rotation
> > // Since there is only control of scale, there is no issue with camera //
> > rotation or cropping.  Everything should work fine with no jank.
> > var encodings = [{ssrc: 1}];
> >
> >
> > That's it.  I apologize for the typos that I'm sure I missed in such a
> long
> > email.  I look forward to the discussion :).
> >
> >
> >
>
Received on Wednesday, 19 February 2014 01:18:15 UTC