Timing information: Thoughts

This note inspired by, but not in any way dependent on, 
draft-mandyam-rtcweb-data-synch-00.txt.... it's a long note, those who 
just want what I propose to actually do can skip to the end.


When building interesting applications with data flows, there frequently 
will occur the need to relate data items to times in the media timeline. 
This can be as simple as advancing a slide show when the speaker pushes 
a button, or as complex as putting an interaction overlay over a video 
and telling the user to "hit the ball in the picture" - you have to know 
where the ball is on the video in order to know whether it's a hit or not.

The task is made somewhat more complicated by the lack of common clocks 
across the pieces of the application, and the many sources of delay on 
the way.

A simplified picture:

Sender is able to easily refer to a common clock while measuring:
* Media source (camera / microphone) delay (constant)
* Encoding time (probably roughly constant)
* Outgoing network buffering (variable)
- measurement point: RTP timestamp when sending (reported in RTCP)

* Network delay (variable)

Receiver is able to easily refer to a common clock on
* Receipt time (measured in metrics)
* Jitter buffer time (variable)
* Decoding time (roughly constant)
* Playout delay (roughly constant)

What the receiving application wants is to "know" that data item X 
refers to the point in time when video frame Y was grabbed and audio 
sample Z was recorded, so that when video frame Y is painted on the 
screen or audio sample Z enters the listener's ear, it can do the 
appropriate thing (whatever that is).

The RTP sender report (RFC 3550 section 6.4.1) relates the RTP clock to 
an NTP timestamp. The description says:

    NTP timestamp: 64 bits
       Indicates the wallclock time (see Section 4) when this report was
       sent so that it may be used in combination with timestamps
       returned in reception reports from other receivers to measure
       round-trip propagation to those receivers.  Receivers should
       expect that the measurement accuracy of the timestamp may be
       limited to far less than the resolution of the NTP timestamp. The
       measurement uncertainty of the timestamp is not indicated as it
       may not be known.  On a system that has no notion of wallclock
       time but does have some system-specific clock such as "system
       uptime", a sender MAY use that clock as a reference to calculate
       relative NTP timestamps.  It is important to choose a commonly
       used clock so that if separate implementations are used to produce
       the individual streams of a multimedia session, all
       implementations will use the same clock.  Until the year 2036,
       relative and absolute timestamps will differ in the high bit so
       (invalid) comparisons will show a large difference; by then one
       hopes relative timestamps will no longer be needed.  A sender that
       has no notion of wallclock or elapsed time MAY set the NTP
       timestamp to zero.

    RTP timestamp: 32 bits
       Corresponds to the same time as the NTP timestamp (above), but in
       the same units and with the same random offset as the RTP
       timestamps in data packets.  This correspondence may be used for
       intra- and inter-media synchronization for sources whose NTP
       timestamps are synchronized, and may be used by media-independent
       receivers to estimate the nominal RTP clock frequency.  Note that
       in most cases this timestamp will not be equal to the RTP
       timestamp in any adjacent data packet.  Rather, it MUST be
       calculated from the corresponding NTP timestamp using the
       relationship between the RTP timestamp counter and real time as
       maintained by periodically checking the wallclock time at a
       sampling instant.

(It is tempting to infer that this means that the RTP timestamp refers 
to the capture time for the media stream - this needs verification.)

Thus, if we know:

- The NTP-to-RTP mapping at the remote end
- The RTP timestamp of the media stream at the moment it is played out

it follows that the sender can transmit its NTP timestamp as part of a 
data packet, and the recipient can then calculate the time at which the 
"same" instant is played out in the media flow.

[NOTE: We do NOT know that the NTP timestamp from the remote side 
corresponds to now(). Clocks are often out of sync, and (in really bad 
cases) can have noticeable clock drift.]

This requires that:

- The sender has access to the NTP time corresponding to the RTP 
timestamp being put on "the current frame" at recording
- The recipient has access to the RTP timestamp of "the current frame" 
being played out
- The recipient has access to the NTP-to-RTP mapping

The last point can be replaced, with no lack of generality, with giving 
access to the calculated NTP timestamp corresponding to "the current frame".

We could also think of giving access to the RTP timestamps directly, and 
skipping NTP. This would be convenient for a single media stream, and 
loosen the dependency on RTCP sender reports - the downside is that it 
makes it more complex to relate events on multiple media streams.

Add an attribute to MediaStreamTrack called "SenderClock". It is the NTP 
timestamp of the "current frame" being passed to the consumer of this track.

This attribute can be read directly, and is also returned in the 
GetStats function for a track; this allows JS to compute exactly the 
offset between the SenderClock and the system's clock, if desirable.

For local MediaStreamTracks, the SenderClock is always now() minus some 
constant time (zero?). We model all the delays done at the sender side 
as being part of the PeerConnection, not part of the media source.

For remote MediaStreamTracks, the SenderClock is the calculated value of 
the NTP time corresponding to the RTP timestamp of the last frame or 
sample rendered. We model all the delays as being part of the 
PeerConnection, not as part of the sink.

Received on Saturday, 9 February 2013 22:14:11 UTC