Re: Low latency video in WebRTC from Randell Jesup on 2018-06-20 (public-webrtc@w3.org from June 2018)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Wed, 20 Jun 2018 17:24:51 -0400
To: public-webrtc@w3.org
Message-ID: <09289957-8d4b-6702-0ab3-96973538df4f@jesup.org>
On 6/20/2018 2:25 PM, Mondain wrote:
> Yeah Chrome and all browsers in-general have been "fun" to keep 
> compatible; been in the WebRTC game since 2013; fondly remember the 
> weekly moving targets for support during development. Sub half second 
> is quite good and probably qualifies as ultra-low, in our case we put 
> a server in the middle, no P2P. I find your means of glass to glass 
> testing interesting!

P2P video latency (and audio latency!) *should* be <150ms mouth-to-ear 
for maximum conversational quality.  Of course, a major component of 
this in WAN cases is the network delay (and resultant jitter buffer 
delay due to network jitter).  <250ms should be the case as often as 
possible.

If there's a TURN server (or an SFU), you really want it near (in 
network terms) one end of the conversation.  Total delay in an SFU case 
may be longer as it's impossible to ensure that an SFU is near all (or 
all-but-one) person in a conference.  In that case, you may have 1 extra 
network delay added and maybe some jitter.

That said: if it's not meeting these, file a bug!

On Wed, Jun 20, 2018 at 1:26 PM, Sergio Garcia Murillo 
<sergio.garcia.murillo@gmail.com 
<mailto:sergio.garcia.murillo@gmail.com>> wrote:
 > in terms of implementation what would that imply?
 >
 > I would think that this could help to remove latency at the cost of 
reducing reliability/quality:
 >
 > -enable slice mode on codecs that support it so video decoding can 
happen before full frame is received.

This will gain at most a fraction of a frame-time.  Better would be to 
run higher frame rates, and lower resolution, by limiting the capture size.

 > -turn off lip sync

That's trivial; just don't put the audio and video tracks in the same 
stream.

 > -turn off packet buffers and rtx/FEC

rtx may be a (big) win if it avoids an keyframe on loss:
loss -> FIR -> network delay -> frame-delay -> send keyframe (large, may 
require a number of frametimes to send - perhaps 5 or 10 even) -> 
network delay -> jitter buffer delay
vs:
loss -> NACK -> network delay -> retransmit 1 packet normally -> network 
delay

 > some of them are easier than others

Even better for remote control: allow telling the UA to decode-on-errors 
instead of the (prettier) stall-on-errors. Decode with a missing packet 
(or skip the entire frame), which produces a stream with errors, then 
when the missing packet shows up, re-decode and catch up (which you'd do 
anyways if you stall-on-errors).  You just have to keep the decoded 
packets (and decoder state!!) around - the second part being trickier 
than the first.  Of course, this helps more in cases with longish 
network delays where RTX might require a total of a couple of hundred ms 
-- if NACK->network->retransmit(quick)->network delay is short (a few 
frame times?), it's not worth speculatively decoding.

If driving a rover on the moon -- speculatively decode!  If network 
delay is <50ms, it's probably not a win or much of one.   And it is 
complex (though the webrtc.org codebase has some support for such 
things, but not enabled anywhere so far as I know -- and it may require 
support in the codec).  It can be done simply if you use FIR instead of 
NACK, though the amount of time spent decoding with errors would be 
longer.  However, the recovery is totally straightforward and 
automatic.  You can crank up the quantization on the keyframe to reduce 
size (and latency) at the cost of temporary quality reduction.


Other things off the top of my head: use low resolution/high frame 
rate.  Cap the encode bitrate (allows for faster packet bursts on errors 
or motion spikes or on keyframes) if there's "headroom". (Perhaps) use 
simulcast.  Use temporal scaling which also gives possible error 
resilience at a drop in frame rate.  If lots-of-bits-available and 
worried about skips on loss, use two streams in parallel (not 
traditional simulcast) at lower bitrates/resolutions, and if one skips 
(or runs a longer jitter buffer) show the other.  (Costs CPU too for 
encoding twice).  Sort of poor man's FEC; real FEC may be preferable, 
though perhaps not in this usecase.

-- 
Randell Jesup -- rjesup a t mozilla d o t com
Please please please don't email randell-ietf@jesup.org!  Way too much spam
Received on Wednesday, 20 June 2018 21:28:04 UTC