[webrtc-stats] End-to-end delay metrics (#537)

henbos has just created a new issue for https://github.com/w3c/webrtc-stats:

== End-to-end delay metrics ==
End-to-end delay refers to the time it takes between the capture of a video frame or audio sample and the playout of that frame or sample at another endpoint. This includes one-way delay estimation as well as sender- and receiver-side buffering. `roundTripTime/2` is a crucial part of the estimation, but is **not** sufficient to obtain E2E delay as it only accounts for the last hop, and there could be servers in-between the sender and the receiver. Fortunately, RTP timestamps and RTCP Sender Reports mapping RTP to NTP are all the puzzle pieces needed to solve this problem, regardless of number of hops. This assumes that relay servers are not giving garbage RTP to NTP mappings to the receiver.

This was previously discussed [years ago](https://github.com/w3c/webrtc-stats/issues/158) but was not resolved. It is important that the spec-compliant getStats() provide an alternative to Chrome's legacy callback-based getStats() API.

**How to calculate E2E Delay**
For now, let's only focus on a Sender and a Receiver. We have...
- RTP packets with RTP timestamps: We want to know "how long ago was this captured?", but the RTP timestamp has an arbitrary offset and a clock rate defined by the codec.
- RTCP packets giving us RTT measurements: The RTT/2 is used to estimate the one-way delay of the Sender. If the Sender is a relay server, _the RTT only covers the last hop_.
- RTCP packets giving us the offset allowing us to convert RTP timestamps to Sender NTP time. This is [estimatedPlayoutTimestamp](https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-estimatedplayouttimestamp).
- RTCP packets giving us the Sender NTP time that the RTCP packet was sent.

Calculations:

The clock difference between the Sender NTP time and the Receiver NTP time is estimated at the Receiver by looking at the time Receiver NTP time when the RTCP packet is received, subtracting by the Sender NTP timestamp to get the difference and adding RTT/2 to account for the time passed between sending and receiving. To avoid jittery values, a smoothed RTT value should be used based on multiple RTT samples. When receiving an RTCP Sender Report:
```
estimatedNtpDelta = reportReceivedInReceiverNtp - reportSentInSenderNtp -
                    smoothedRtt/2
```

When updated [estimatedPlayoutTimestamp](https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-estimatedplayouttimestamp):
```
playoutTimeInReceivedNtp = current time according to local NTP clock

estimatedPlayoutTimestamp = calculate according to spec

estimatedPlayoutTimestampInReceiverNtp = estimatedPlayoutTimestamp +
                                         estimatedNtpDelta

e2eDelay = playoutTimeInReceivedNtp - estimatedPlayoutTimestampInReceiverNtp
```

**But what if there is a relay Server between the Sender and Receiver?**
In this case, the Sender is actually a Server and the "Sender NTP timestamps" are actually the Server NTP timestamps.

The Server is sending both RTP packets (relayed) and RTCP packets, including Server NTP timestamps and how to map the RTP timestamps to Server NTP time.

It is thus the Server's responsibility that RTP timestamps can be mapped to the correct NTP timestamp. This requires rewriting the RTP timestamps to adjust for the difference between the original Sender's NTP clock and the Server's NTP clock, including taking Sender-Server one-way delay estimates into account. The timestamp is converted from Sender clock to Server clock, and the Receiver does not have to care if there was a server in-between or not.

Since the Server bakes in its own delay estimates into the timestamp rewrite, the resulting e2eDelay will be for the entire trip - not just the RTT/2 of the last hop.

_Note:_
If the Server provides us with incorrect RTP -> NTP mappings, the e2eDelay will be garbage. But if the Server is providing the Receiver with incorrect information then there is not much we can do, and [estimatedPlayoutTimestamp](https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-estimatedplayouttimestamp) is also wrong.

**Proposal**
Add RTCInboundRtpStreamStats.estimatedEndToEndDelay defined according to the above calculations of `e2eDelay`.

This proposal does not touch on how to smooth the RTT values, but leaves that up to the implementation.

Please view or discuss this issue at https://github.com/w3c/webrtc-stats/issues/537 using your GitHub account

Received on Tuesday, 28 January 2020 13:39:16 UTC