W3C home > Mailing lists > Public > public-webrtc@w3.org > June 2018

Re: Low latency video in WebRTC

From: Mondain <mondain@gmail.com>
Date: Wed, 20 Jun 2018 14:43:52 -0700
Message-ID: <CAHQq8Q+VODEvc5ZXKPEo186+OwuGYLBbyJFFUMXwqvOF_2-P-g@mail.gmail.com>
To: Randell Jesup <randell-ietf@jesup.org>
Cc: public-webrtc@w3.org
Putting on the architect hat for a moment here; I would recommend not using
webrtc for a robot project that requires near realtime feed and control.
You'd be better served with rtsp based solution, but those are just my two

On Wed, Jun 20, 2018, 14:29 Randell Jesup <randell-ietf@jesup.org> wrote:

> On 6/20/2018 2:25 PM, Mondain wrote:
> Yeah Chrome and all browsers in-general have been "fun" to keep
> compatible; been in the WebRTC game since 2013; fondly remember the weekly
> moving targets for support during development. Sub half second is quite
> good and probably qualifies as ultra-low, in our case we put a server in
> the middle, no P2P. I find your means of glass to glass testing
> interesting!
> P2P video latency (and audio latency!) *should* be <150ms mouth-to-ear for
> maximum conversational quality.  Of course, a major component of this in
> WAN cases is the network delay (and resultant jitter buffer delay due to
> network jitter).  <250ms should be the case as often as possible.
> If there's a TURN server (or an SFU), you really want it near (in network
> terms) one end of the conversation.  Total delay in an SFU case may be
> longer as it's impossible to ensure that an SFU is near all (or
> all-but-one) person in a conference.  In that case, you may have 1 extra
> network delay added and maybe some jitter.
> That said: if it's not meeting these, file a bug!
> On Wed, Jun 20, 2018 at 1:26 PM, Sergio Garcia Murillo <
> sergio.garcia.murillo@gmail.com> wrote:
> > in terms of implementation what would that imply?
> >
> > I would think that this could help to remove latency at the cost of
> reducing reliability/quality:
> >
> > -enable slice mode on codecs that support it so video decoding can
> happen before full frame is received.
> This will gain at most a fraction of a frame-time.  Better would be to run
> higher frame rates, and lower resolution, by limiting the capture size.
> > -turn off lip sync
> That's trivial; just don't put the audio and video tracks in the same
> stream.
> > -turn off packet buffers and rtx/FEC
> rtx may be a (big) win if it avoids an keyframe on loss:
> loss -> FIR -> network delay -> frame-delay -> send keyframe (large, may
> require a number of frametimes to send - perhaps 5 or 10 even) -> network
> delay -> jitter buffer delay
> vs:
> loss -> NACK -> network delay -> retransmit 1 packet normally -> network
> delay
> > some of them are easier than others
> Even better for remote control: allow telling the UA to decode-on-errors
> instead of the (prettier) stall-on-errors.  Decode with a missing packet
> (or skip the entire frame), which produces a stream with errors, then when
> the missing packet shows up, re-decode and catch up (which you'd do anyways
> if you stall-on-errors).  You just have to keep the decoded packets (and
> decoder state!!) around - the second part being trickier than the first.
> Of course, this helps more in cases with longish network delays where RTX
> might require a total of a couple of hundred ms -- if
> NACK->network->retransmit(quick)->network delay is short (a few frame
> times?), it's not worth speculatively decoding.
> If driving a rover on the moon -- speculatively decode!  If network delay
> is <50ms, it's probably not a win or much of one.   And it is complex
> (though the webrtc.org codebase has some support for such things, but not
> enabled anywhere so far as I know -- and it may require support in the
> codec).  It can be done simply if you use FIR instead of NACK, though the
> amount of time spent decoding with errors would be longer.  However, the
> recovery is totally straightforward and automatic.  You can crank up the
> quantization on the keyframe to reduce size (and latency) at the cost of
> temporary quality reduction.
> Other things off the top of my head: use low resolution/high frame rate.
> Cap the encode bitrate (allows for faster packet bursts on errors or motion
> spikes or on keyframes) if there's "headroom".    (Perhaps) use simulcast.
> Use temporal scaling which also gives possible error resilience at a drop
> in frame rate.  If lots-of-bits-available and worried about skips on loss,
> use two streams in parallel (not traditional simulcast) at lower
> bitrates/resolutions, and if one skips (or runs a longer jitter buffer)
> show the other.  (Costs CPU too for encoding twice).  Sort of poor man's
> FEC; real FEC may be preferable, though perhaps not in this usecase.
> --
> Randell Jesup -- rjesup a t mozilla d o t com
> Please please please don't email randell-ietf@jesup.org!  Way too much spam
Received on Wednesday, 20 June 2018 21:44:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:18:42 UTC