Re: Some questions about RTCRtpEncodingParameters... from Robin Raymond on 2014-05-12 (public-ortc@w3.org from May 2014)

From: Robin Raymond <robin@hookflash.com>
Date: Mon, 12 May 2014 09:01:26 -0400
To: Justin Uberti <juberti@google.com>
CC: Peter Thatcher <pthatcher@google.com>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-ortc@w3.org" <public-ortc@w3.org>
Message-ID: <5370C626.6040704@hookflash.com>
I think we should decide what SVC modes we want to support.

Regardless, it's good to know what we would need to add later to know if 
we made any mistakes in the design. It's clear to me "bias" won't work 
later with additional SVC modes and it would be better to split that to 
"scale priority" and "frame rate priority" which woks like bias except 
it would allow more SVC modes to be added later with their own priorities.

Technically, there's nothing for us "to do" as implementors if you don't 
support those SVC modes since the knobs are N/A in those cases but I can 
understand why to keep the surface to a minimum even if technically 
nothing is needed.

As for "min" "priority" and "relative" for each SVC mode we plan to 
include, yes, they are kind of important. The "min" is needed for 
specific use cases (and possibly to define the minimum for the base 
layer). The priority is needed for the engine to know which layer to 
sacrifice when resources are restricted. The "relative" value is needed 
to be able to define the "params" layering structure otherwise there's 
no definition for the how to setup the layering for that SVC mode.

Specific use case that needs all three "min values": sign language, need 
a min scale and min quality to see fingers and need min frame rate to 
see gestures.

If we remove certain knobs we'll have to figure out which use cases we 
are willing to drop for the first version of the API (or the use cases 
only work on a best effort scenario rather than helping ensure they work 
with specific controls).

-Robin


> Justin Uberti <mailto:juberti@google.com>
> May 12, 2014 at 2:08 AM
> Do we need them all right now? I would argue we could probably get by 
> with a lot less for ORTC 1.
>
>
>
> Robin Raymond <mailto:robin@hookflash.com>
> May 10, 2014 at 4:03 PM
>
> I'm aware of at  least 6 SVC modes of operation, but I believe the top 
> 3 we should be concerned with are "spacial", "temporal" and "quality".
>
> To support that you need to be able to have a minimal value per each 
> supported mode:
> - minScale - don't go smaller than this size
> - minFrameRate - don't send lower than this frame rate
> - minQuality - don't let quality go less than this
>
> Those values would typically get used as the min requirements for a 
> base layer in SVC.
>
> Plus the "bias" doesn't work since it's only factoring "spacial" and 
> "temporal" priorities but neglects "quality".
>
> You need to have
> - scale priority - relative to frame rate/quality
> - frame rate priority - relative to scale/quality
> - quality priority - relative to scale/frame rate
>
> Then you need in each layer:
> - relative scale - relative scale from base (how much of an 
> enhancement is this layer)
> - relative frame rate - relative frame rate from base (how much of an 
> enhancement is this layer)
> - relative quality - relative to base (how much of an enhancement is 
> this layer)
>
> You need to know what the purpose of each layer is: scale, frame rate, 
> quality.
>
> And there are cross dependencies between layers operating in different 
> SVC modes.
>
> (and yes, there are use cases why you need all of these if you you to 
> support three SVC modes of operation)
>
> -Robin
>
> Peter Thatcher <mailto:pthatcher@google.com>
> May 9, 2014 at 5:22 PM
>
>
>
> On Fri, May 9, 2014 at 12:06 PM, Bernard Aboba 
> <Bernard.Aboba@microsoft.com <mailto:Bernard.Aboba@microsoft.com>> wrote:
>
>     [BA] For example, while the encodingId and dependencyEncodingIds
>     can be used to set up layering, to do layering and simulcast
>     together requires setting up multiple sender objects.
>
>     [Peter] Why?  Can't you express a mix here?
>
>     [BA] As an example, let's say that in a single sender object we
>     wanted to simulcast two streams, each with different resolution,
>     and each of which support temporal scalability with two layers.
>      So we have four streams overall.   Let's say we have a
>     "framescale" variable which means
>     to use that fraction of the framerate from getUserMedia.  Would
>     this work?
>
>     var encodings =[{
>       layerId: "halfScaleBase",
>       scale: 0.5,
>       framescale: 0.5
>     }, {
>       layerId: "fullScaleBase",
>       scale: 1.0,
>       framescale: 0.5
>     }, {
>       layerId: "temporalEnhancemenToHalfScaleBase",
>       layerDependencies: ["halfScaleBase"],
>       scale: 0.5,
>       framescale: 1.0
>     }, {
>       layerId: "temporalEnhancementToFullScaleBase",
>       layerDependencies: ["fullScaleBase"],
>       scale: 1.0,
>       framescale: 1.0
>     }]
>
>
> That seems like a reasonable way to express what you want.  Is it 
> implementable?  Also, I'm not sure if I like the name "framescale". 
>  Perhaps "relativeFramerate" or "framerateScale"?
>
>
>     [Peter] That said, we do need to make sure that our RtpCapabilties
>     object provides enough information to be able to do this.  Do you
>     think it's lacking something in particular?
>
>     [BA]  There are a few things that come to mind:
>
>        a. One is a statement about what types of scalability a given
>     encoder/decoder supports.  Another is how many layers they support
>     within that type.    These can perhaps
>        be combined into a capability array:  [temporalMaxLayers,
>     spatialMaxLayers, qualityMaxLayers] where setting MaxLayers of a
>     given type to 0 means "I don't support it at all".
>
>
>
> This:
>
> int temporalLayers;
> int spatialLayers;
> int qualityLayers;
>
> sounds good to me, although I'm not sure about the quality layers, 
> since we don't have a way to express those in RtpParameters yet (do we?).
>
> Now where should these go?  Should they be in RtpCapabilities 
> directly?  Or are they codec-specific, so we need to so something more 
> fancy, like RtpVideoEncodingCapabilities { DOMString codecName;  int 
> temporalLayers; ... }, and then have a 
> sequence<RtpVideoEncodingCapabilities> in RtpCapabilities (much like 
> there is an array of RtpEncodingParameters in RtpParameters)?
>
>
>        b.  Another is a statement about what types of simulcast is
>     supported within an object and how many streams are supported.
>     This also might be expressed in a similar array.
>
>
> int temporalLayers;
> int spatialLayers;
> int simulcastLayers;
>
> ?
>
> Although, I don't quite understand why there would be a limit.  You 
> can create as many RtpSenders as you want, right?
>
>
>     [BA] Some other questions relating to whether these attributes
>     make sense for SVC at all.
>     For example, priority makes sense for specifying the priority
>     between audio and video – but what does it mean when it is
>     specified in individual SVC layers?
>
>     [Peter] Maybe it wouldn't.  But it does for simulcast, doesn't it?
>
>     [BA] Yes, for simulcast it makes sense.
>     
>     [BA] Similarly, minQuality might make some sense for a base layer
>     (as might minFrameRate or minResolution), but what does it mean to
>     specify this at each SVC layer?
>
>     [Peter] Maybe for SVC it doesn't.  But it does for simulcast,
>     doesn't it?
>
>     [BA] "Quality" has a specific meaning within SVC (e.g. "quality
>     scalability").   So I'm not sure the variable is intended to have
>     the same meaning in simulcast.
>
>
> I think for quality scalability, we'd need a different field. 
>  "qualityScale"?
>
>     
>     [BA] Also, is it necessary to provide a maxBitrate knob for each
>     layer in SVC?
>
>     [Peter] Again, maybe not all knobs make sense for SVC.  So if
>     you're doing SVC, don't use the knobs that don't make sense.
>
>     [BA] The knobs may make sense at an overall level -- as in "I'd
>     like to impose a maximum Bit rate on the combination of layers".
>
> 
> That last bit about a  maximum for all layers, I'm not sure how to 
> solve.  I think Justin had an idea about how to do it previously. 
>  Justin?  Are you reading this thread?
>
> Bernard Aboba <mailto:Bernard.Aboba@microsoft.com>
> May 9, 2014 at 3:06 PM
> [BA] For example, while the encodingId and dependencyEncodingIds can 
> be used to set up layering, to do layering and simulcast together 
> requires setting up multiple sender objects.
>
> [Peter] Why? Can't you express a mix here?
>
> [BA] As an example, let's say that in a single sender object we wanted 
> to simulcast two streams, each with different resolution, and each of 
> which support temporal scalability with two layers. So we have four 
> streams overall. Let's say we have a "framescale" variable which means
> to use that fraction of the framerate from getUserMedia. Would this work?
>
> var encodings =[{
> layerId: "halfScaleBase",
> scale: 0.5,
> framescale: 0.5
> }, {
> layerId: "fullScaleBase",
> scale: 1.0,
> framescale: 0.5
> }, {
> layerId: "temporalEnhancemenToHalfScaleBase",
> layerDependencies: ["halfScaleBase"],
> scale: 0.5,
> framescale: 1.0
> }, {
> layerId: "temporalEnhancementToFullScaleBase",
> layerDependencies: ["fullScaleBase"],
> scale: 1.0,
> framescale: 1.0
> }]
>
>
> [Peter] That said, we do need to make sure that our RtpCapabilties 
> object provides enough information to be able to do this. Do you think 
> it's lacking something in particular?
>
> [BA] There are a few things that come to mind:
>
> a. One is a statement about what types of scalability a given 
> encoder/decoder supports. Another is how many layers they support 
> within that type. These can perhaps
> be combined into a capability array: [temporalMaxLayers, 
> spatialMaxLayers, qualityMaxLayers] where setting MaxLayers of a given 
> type to 0 means "I don't support it at all".
>
> b. Another is a statement about what types of simulcast is supported 
> within an object and how many streams are supported. This also might 
> be expressed in a similar array.
>
> [BA] Some other questions relating to whether these attributes make 
> sense for SVC at all.
> For example, priority makes sense for specifying the priority between 
> audio and video – but what does it mean when it is specified in 
> individual SVC layers?
>
> [Peter] Maybe it wouldn't. But it does for simulcast, doesn't it?
>
> [BA] Yes, for simulcast it makes sense.
> 
> [BA] Similarly, minQuality might make some sense for a base layer (as 
> might minFrameRate or minResolution), but what does it mean to specify 
> this at each SVC layer?
>
> [Peter] Maybe for SVC it doesn't. But it does for simulcast, doesn't it?
>
> [BA] "Quality" has a specific meaning within SVC (e.g. "quality 
> scalability"). So I'm not sure the variable is intended to have the 
> same meaning in simulcast.
> 
> [BA] Also, is it necessary to provide a maxBitrate knob for each layer 
> in SVC?
>
> [Peter] Again, maybe not all knobs make sense for SVC. So if you're 
> doing SVC, don't use the knobs that don't make sense.
>
> [BA] The knobs may make sense at an overall level -- as in "I'd like 
> to impose a maximum Bit rate on the combination of layers".
>
>
>
>
>
>
>
> Peter Thatcher <mailto:pthatcher@google.com>
> May 9, 2014 at 2:23 PM
>
>
>
> On Mon, May 5, 2014 at 4:36 PM, Bernard Aboba 
> <Bernard.Aboba@microsoft.com <mailto:Bernard.Aboba@microsoft.com>> wrote:
>
>     Another set of questions I had about the Editor’s draft relates to
>     RTCRtpEncodingParameters, which is defined as follows:
>
>     dictionary RTCRtpEncodingParameters {
>
>         unsigned int?        ssrc = null;
>
>         DOMString?           codecName = "";
>
>         RTCRtpFecParameters? fec;
>
>         RTCRtpRtxParameters? rtx;
>
>         double               priority = 1.0;
>
>         double               maxBitrate = null;
>
>         double               minQuality = null;
>
>         double               frameratebias = 0.5;
>
>         double               scale = null;
>
>         boolean              active = true;
>
>         DOMString?           encodingId;
>
>         sequence<DOMString>  dependencyEncodingIds;
>
>     };
>
>     To take this for a spin, I looked at how it might be used to
>     handle a few use cases:
>
>     a.Temporal scalability;
>
>     b.Spatial simulcast combined with temporal scalability.
>
>     Maybe it’s just me, but it seemed that there was some missing
>     functionality.
>
>     For example, while the encodingId and dependencyEncodingIds can be
>     used to set up layering, to do layering and simulcast together
>     requires setting up multiple sender objects.
>
>
> Why?  Can't you express a mix here?
>
>     This leaves the application having to deal with tradeoffs between
>     simulcast and layering, which could be challenging.  Also, the
>     above RTCRtpEncodingParameters object doesn’t seem to be able to
>     handle temporal scalabiltiy, only spatial (via the scale
>     attribute).  This is because there is no “framescale” attribute to
>     provide instruction on how to divide the framerate between the
>     various layers.
>
>
>  "framescale" sounds like an interesting knob to add.  Are you 
> proposing it?  
>
>     Also, it occurred to me that a developer attempting to set up the
>     RTCRtpEncodiingParameters object correctly might encounter quite a
>     few challenges.    This object seems like it would be best set up
>     automatically under the covers based on some general
>     developer-provided preferences -- something very high level like
>     “I want SVC if it is available, figure out what will work best”.  
>     The browser should be able to figure this out based on the
>     “capabilities” of each peer, such as the number of layers that the
>     encoder/decoder can handle of each layering type  (e.g. temporal,
>     spatial, quality), or information about simulcast capabilities
>     (e.g. the maximum number of simulcast streams that the encoder can
>     handle).
>
>
> I think putting to much into the browser would be a mistake.  I think 
> JS and libraries would be better suited to handle these more advanced 
> use cases.
>
> That said, we do need to make sure that our RtpCapabilties object 
> provides enough information to be able to do this.  Do you think it's 
> lacking something in particular?
>
>
>
>     Some other questions relating to whether these attributes make
>     sense for SVC at all.
>
>     For example, priority makes sense for specifying the priority
>     between audio and video – but what does it mean when it is
>     specified in individual SVC layers?
>
>
> Maybe it wouldn't.  But it does for simulcast, doesn't it?
> 
>
>     Similarly, minQuality might make some sense for a base layer (as
>     might minFrameRate or minResolution), but what does it mean to
>     specify this at each SVC layer?
>
>
>  Maybe for SVC it doesn't.  But it does for simulcast, doesn't it?
> 
>
>     Also, is it necessary to provide a maxBitrate knob for each layer
>     in SVC?
>
>
> Again, maybe not all knobs make sense for SVC.  So if you're doing 
> SVC, don't use the knobs that don't make sense.
>
>
Received on Monday, 12 May 2014 13:01:57 UTC