- From: Harald Alvestrand <harald@alvestrand.no>
- Date: Sat, 09 Feb 2013 23:13:41 +0100
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
This note inspired by, but not in any way dependent on,
draft-mandyam-rtcweb-data-synch-00.txt.... it's a long note, those who
just want what I propose to actually do can skip to the end.
-------------------------------------------------
When building interesting applications with data flows, there frequently
will occur the need to relate data items to times in the media timeline.
This can be as simple as advancing a slide show when the speaker pushes
a button, or as complex as putting an interaction overlay over a video
and telling the user to "hit the ball in the picture" - you have to know
where the ball is on the video in order to know whether it's a hit or not.
The task is made somewhat more complicated by the lack of common clocks
across the pieces of the application, and the many sources of delay on
the way.
A simplified picture:
Sender is able to easily refer to a common clock while measuring:
* Media source (camera / microphone) delay (constant)
* Encoding time (probably roughly constant)
* Outgoing network buffering (variable)
- measurement point: RTP timestamp when sending (reported in RTCP)
* Network delay (variable)
Receiver is able to easily refer to a common clock on
* Receipt time (measured in metrics)
* Jitter buffer time (variable)
* Decoding time (roughly constant)
* Playout delay (roughly constant)
What the receiving application wants is to "know" that data item X
refers to the point in time when video frame Y was grabbed and audio
sample Z was recorded, so that when video frame Y is painted on the
screen or audio sample Z enters the listener's ear, it can do the
appropriate thing (whatever that is).
The RTP sender report (RFC 3550 section 6.4.1) relates the RTP clock to
an NTP timestamp. The description says:
NTP timestamp: 64 bits
Indicates the wallclock time (see Section 4) when this report was
sent so that it may be used in combination with timestamps
returned in reception reports from other receivers to measure
round-trip propagation to those receivers. Receivers should
expect that the measurement accuracy of the timestamp may be
limited to far less than the resolution of the NTP timestamp. The
measurement uncertainty of the timestamp is not indicated as it
may not be known. On a system that has no notion of wallclock
time but does have some system-specific clock such as "system
uptime", a sender MAY use that clock as a reference to calculate
relative NTP timestamps. It is important to choose a commonly
used clock so that if separate implementations are used to produce
the individual streams of a multimedia session, all
implementations will use the same clock. Until the year 2036,
relative and absolute timestamps will differ in the high bit so
(invalid) comparisons will show a large difference; by then one
hopes relative timestamps will no longer be needed. A sender that
has no notion of wallclock or elapsed time MAY set the NTP
timestamp to zero.
RTP timestamp: 32 bits
Corresponds to the same time as the NTP timestamp (above), but in
the same units and with the same random offset as the RTP
timestamps in data packets. This correspondence may be used for
intra- and inter-media synchronization for sources whose NTP
timestamps are synchronized, and may be used by media-independent
receivers to estimate the nominal RTP clock frequency. Note that
in most cases this timestamp will not be equal to the RTP
timestamp in any adjacent data packet. Rather, it MUST be
calculated from the corresponding NTP timestamp using the
relationship between the RTP timestamp counter and real time as
maintained by periodically checking the wallclock time at a
sampling instant.
(It is tempting to infer that this means that the RTP timestamp refers
to the capture time for the media stream - this needs verification.)
Thus, if we know:
- The NTP-to-RTP mapping at the remote end
- The RTP timestamp of the media stream at the moment it is played out
it follows that the sender can transmit its NTP timestamp as part of a
data packet, and the recipient can then calculate the time at which the
"same" instant is played out in the media flow.
[NOTE: We do NOT know that the NTP timestamp from the remote side
corresponds to now(). Clocks are often out of sync, and (in really bad
cases) can have noticeable clock drift.]
This requires that:
- The sender has access to the NTP time corresponding to the RTP
timestamp being put on "the current frame" at recording
- The recipient has access to the RTP timestamp of "the current frame"
being played out
- The recipient has access to the NTP-to-RTP mapping
The last point can be replaced, with no lack of generality, with giving
access to the calculated NTP timestamp corresponding to "the current frame".
We could also think of giving access to the RTP timestamps directly, and
skipping NTP. This would be convenient for a single media stream, and
loosen the dependency on RTCP sender reports - the downside is that it
makes it more complex to relate events on multiple media streams.
Suggestion
=======
Add an attribute to MediaStreamTrack called "SenderClock". It is the NTP
timestamp of the "current frame" being passed to the consumer of this track.
This attribute can be read directly, and is also returned in the
GetStats function for a track; this allows JS to compute exactly the
offset between the SenderClock and the system's clock, if desirable.
For local MediaStreamTracks, the SenderClock is always now() minus some
constant time (zero?). We model all the delays done at the sender side
as being part of the PeerConnection, not part of the media source.
For remote MediaStreamTracks, the SenderClock is the calculated value of
the NTP time corresponding to the RTP timestamp of the last frame or
sample rendered. We model all the delays as being part of the
PeerConnection, not as part of the sink.
Received on Saturday, 9 February 2013 22:14:11 UTC