Re: Some thoughts on simulcast/layered coding support in ORCA API from Gustavo Garcia on 2013-11-07 (public-orca@w3.org from November 2013)

From: Gustavo Garcia <ggb@tokbox.com>
Date: Wed, 6 Nov 2013 21:21:50 -0800
To: Bernard Aboba <Bernard.Aboba@microsoft.com>
Cc: "public-orca@w3c.org" <public-orca@w3c.org>
Message-ID: <CAPvKHKg3D7dJ7pokp+54WGTsLLPvewC7yimdN80gPG4CPBSk1w@mail.gmail.com>
Great analysis, comments inline...


In general, I agree with the small set of requirements relating to temporal
> scaling (and think they can be reduced even further).
>
>    o  REQ-8.  It must be possible to configure the number of temporal
>       layers (1 to 4).  This should be the only mandatory parameter when
>       enabling temporal scalability.
>
> [BA] IMHO, it is useful to be able to retrieve from the browser the
> maximum number of temporal layers
> supported for send/receive, so as to be able to signal this to the peer if
> necessary.   I also think that
> an application should be able to set the maximum  number of layers
> sent/received.  However,
> the application doesnt' need to control the number of layers sent or
> received on an ongoing basis,
> since this can be handled by the browser with no API controls.
>

> In practice the layer add/drops can happen very quickly (several times a
> second) and will be based on
> congestion state, so application control is not feasible and could even be
> dangerous.
>

[GG] I think the number of layers needs to be configurable in an ongoing
basis.  For example in a conferencing use case (in theory the most common
use case) you probably want to disable layering when you have only two
participants.  Is it a reasonable requirement/use case?


>    o  REQ-9.  It must be possible to configure the bitrate, frame rate
>       decimation factor and membership of frames to layers for each
>       temporal layer of the VP8 stream.
>
> [BA] I would suggest that only frame rate is fundamental here.  The other
> parameters are codec specific.
> In particular, it seems useful for an application to be able to retrieve
> the frame rate configuration that
> the browser supports so it could signal this.  However, the application
> may not necessarily be able to
> control the frame rate of each layer.  For example, in a given
> implementation the application might
> discover that the base frame rate is 7.5 frames, and extensions are 7.5
> and 15 frames/second. Take
> it or leave it!
>
> For temporal scaling, the base layer frame rate is the most important
> parameter, and logically will determine
> the frame rates of extension layers, which are typically designed to allow
> multiplicative increases in frame rate.
> So there is not really an infinite degree of flexibility here, and you
> don't want to give the application so much
> rope it can hang itself.
>
> Allowing each extension layer to have frame rate determined independently
> could result in configuration
> requests that a given implementation might not actually be able to carry
> out and that could play havoc with
> congestion control.
>

[GG] Don't you think that bitrate is also important for the use cases
identified?   Typically we could want to have a layer at 100kbps/7fps and
other one at 1000kbps/30fps.

I was not sure where to draw the line on how many parameters to expose and
I decided to start with all the possible parameters in VP8.   Probably it
doesn't make sense.


>
> The question which I suspect will kick off more debate is how to handle
> spatial simulcast and/or layering.
>
> A major use case is described in Section 2.4 "increasing video quality",
> where the application will want to switch
> from a thumbnail to a larger resolution potentially because the active
> speaker changed or some other reason.
>
> I agree that this is a real scenario but also would caution that there are
> lots of situations where the resolution will
> be changed by the browser without application control.  In reality,
> supported spatial resolutions are typically not
> infinitely variable.  It makes no sense for an application to change the
> aspect ratio frequently, for example -- that is
> disconcerting to the user.   To avoid doing this, a set of resolutions
> with the same aspect ratio may be supported,
> allowing the resolution to change while the window size may not change at
> all (just the quality).
>
> For example, the active speaker might be 640 by 320 and then because of
> bandwidth lack of availability, a lower
> resolution simulcast of 320 by 160 might be selected by the MANE.   This
> wouldn't necessarily imply demotion to
> a thumbnail, just a decrease in resolution made necessary by an increase
> in congestion.  Therefore this could be
> a decision made entirely by the sender.
>
> IMHO, an API should allow the application to retrieve the  maximum number
> of simulcasts to be sent/received so
> this can be conveyed in signaling.   It should also be able to decide how
> many streams to send, and how many it
> could receive.  However, it should be understood that within those
> parameters in practice the mixer will make the
> decision about which simulcast stream it sends based on bandwidth
> availability (which could change very quickly).
> So while the receiver should be able to pause/resume simulcast streams
> this doesn't necessarily imply an ongoing
> burden of receiver control.
>

[GG] I agree.   In case of ORTC it should allow to figure out the number of
simulcasts.   The draft is focused on WebRTC 1.0 where that information
would be included in the SDP.

By the way, what is the use case for the receiver receiving multiple
simulcast streams (instead of a server selecting and forwarding just one of
them)?   Is to send streams with different protection levels (FEC, network
flows with different priorities..)?  Can you elaborate on that so that I
can add it to the list of use cases?


>
> Now for a bit of personal bias as to how the control should be exercised
> in terms of protocol functionality.
>
> I prefer RTCP control of simulcast/layered coding (such as via stream
> pause/resume) to signaling in most cases
> since this allows much faster control.   For simulcast, the RTCP
> pause/resume message can refer to the SSRC
> to be paused/resumed since simulcasts have unique SSRCs.  Within layered
> coding, this is trickier, since only
> in Multi-SSRC Transport (MST) is there a unique SSRC per layer.  So this
> is one of several arguments in favor of MST.
>
> For what it's worth, here is my opinion of the requirements relating to
> spatial simulcast/layered coding.
> As noted below, I would prefer parsimony in terms of the API functionality.
>
>    o  REQ-1.  It must be possible to enable and configure the scalable
>       video coding before initiating a peer connection.
>
>    o  REQ-2.  It must be possible to enable and configure the scalable
>       video coding before answering a peer connection.
>
>    o  REQ-5.  It must be possible to configure the number of simulcasted
>       streams.
>
> [BA] I would support being able to retrieve the maximum simulcasts that a
> browser can send/receive.
>

[GG] Sounds like a good idea, but we don't have a way to detect maximum
number of RTC/PeerConnections and the limit is probably the same, right?


> Also, support for setting a maximum number
> of streams to send/receive.
>

>    o  REQ-3.  It must be possible to enable/disable and re-configure the
>       scalable video coding to update a peer connection.
>
> [BA] As noted earlier, I would support pause/resume functionality, and if
> you're only sending a base layer or a single stream, then I think we've
> satisfied this requirement, haven't we?
>
[GG] I was thinking on adding and removing layers on the fly depending on
number/capabilities of participants in a multiconference.


>
>    o  REQ-6.  It must be possible to configure the minimum and maximum
>       bitrate of each simulcasted stream.
>
> [BA] Because the bitrate can vary based on motion, framerate and
> resolution, it probably isn't a good parameter for use in an API.  So I'd
> focus on framerate for temporal scaling and
> resolution for spatial simulcast and layering.
>

>    o  REQ-7.  It must be possible to configure the resolution of each
>       simulcasted stream.
>
> [BA]  Some amount of configuration does make sense to me, but it's
> worthwhile to keep the practical constraints in mind. Typically the
> simulcast resolutions will be within the same aspect ratio and may be auto
> selected by the sender.  So maybe allow retrieval of the allowable
> resolutions for send/receive and then select among them based on the
> maximum number to be sent/received.
>

[GG] +1


>
>    Requirements regarding RTP usage:
>
>    o  REQ-10.  Congestion control must be supported for all the
>       simulcasted streams between the configured boundaries (min/max
>       bitrate).
>
> [BA] I agree that congestion controls should be built into the browser,
> but I expect its operation to not be under the control of the application.
>
[GG] +1


>
>
>    o  REQ-11.  Transmission of simulcasted streams must be signaled and
>       negotiated in the SDP and transmitted in RTP sessions, making use
>       of existing standard attributes
>       [I-D.westerlund-avtcore-multistream-and-simulcast].
>
> [BA] I disagree that simulcast or layering changes need to be signaled.
>  Several simulcast/layering implementations do not do this.  All that may
> need to be signaled are is the maximum operating envelope.
>
[GG] How does the MCU knows which one is the low bitrate stream and which
one is the high bitrate stream?


>    o  REQ-12.  Any endpoint should be prepared to receive VP8 multi-
>       layered encoded video not requiring out of band negotiation in
>       SDP.
>
> [BA] Not all browsers will necessarily support simulcast or layered coding
> so we can't require that all endpoints support this for any given codec.
>  So being able to retrieve the envelope of support is useful.
>

[GG] I think all existing WebRTC browsers support that requirement for VP8
and it simplifies interoperability.   I agree that it can be different for
other codecs but it is reasonable for VP8 in my opinion. what do you think?

Thank you very much, I will try to add these feedback to next version of
the draft,

G.
Received on Thursday, 7 November 2013 05:22:15 UTC