Re: Timing information: Thoughts

I find this topic interesting, but I would like to get a bit more 
understanding about your thinking.

If we leave the slide change case aside (the needed precision would 
probably be met by sending some "change slide" info on a data channel), 
and look at the "hit the ball in the picture" one, I wonder how the 
location of the ball is found in the first place?

* If the sender is also a WebRTC browser, I guess that you'd do some 
processing using a canvas to detect the ball; and you could do that at 
the receiver end as well.

* If the ball is actually overlaid, I guess you could overlay at the 
receiving end (and would know where the ball is). On the other hand, if 
you want the overlays to be in sync with the video, you'd have to have 
some timing info.

* If the sender is some special device that can analyze video in 
real-time to derive timing and coordinates for the ball, while at the 
same time streaming the video using rtcweb, then I think your proposal 
could add benefit.

I also think that defining a playout time for "current frame" for a 
track is kind of weird. A track is (a JS representation of) a sequence 
of frames, and "current frame" seems to be undefined. And, aren't 
attributes supposed to be unchanged until a "stable state" is reached? 
This would mean that several reads of the attribute would give the same 
time info (even if several frames where played).

The media element already has some timing info attributes. They on the 
other hand talk about seconds (which seems to be far to low resolution 
for this use-case), but perhaps that is the right place to do this kind 
of things.


On 2013-02-09 23:13, Harald Alvestrand wrote:
> This note inspired by, but not in any way dependent on,
> draft-mandyam-rtcweb-data-synch-00.txt.... it's a long note, those who
> just want what I propose to actually do can skip to the end.
> -------------------------------------------------
> When building interesting applications with data flows, there frequently
> will occur the need to relate data items to times in the media timeline.
> This can be as simple as advancing a slide show when the speaker pushes
> a button, or as complex as putting an interaction overlay over a video
> and telling the user to "hit the ball in the picture" - you have to know
> where the ball is on the video in order to know whether it's a hit or not.
> The task is made somewhat more complicated by the lack of common clocks
> across the pieces of the application, and the many sources of delay on
> the way.
> A simplified picture:
> Sender is able to easily refer to a common clock while measuring:
> * Media source (camera / microphone) delay (constant)
> * Encoding time (probably roughly constant)
> * Outgoing network buffering (variable)
> - measurement point: RTP timestamp when sending (reported in RTCP)
> * Network delay (variable)
> Receiver is able to easily refer to a common clock on
> * Receipt time (measured in metrics)
> * Jitter buffer time (variable)
> * Decoding time (roughly constant)
> * Playout delay (roughly constant)
> What the receiving application wants is to "know" that data item X
> refers to the point in time when video frame Y was grabbed and audio
> sample Z was recorded, so that when video frame Y is painted on the
> screen or audio sample Z enters the listener's ear, it can do the
> appropriate thing (whatever that is).
> The RTP sender report (RFC 3550 section 6.4.1) relates the RTP clock to
> an NTP timestamp. The description says:
>     NTP timestamp: 64 bits
>        Indicates the wallclock time (see Section 4) when this report was
>        sent so that it may be used in combination with timestamps
>        returned in reception reports from other receivers to measure
>        round-trip propagation to those receivers.  Receivers should
>        expect that the measurement accuracy of the timestamp may be
>        limited to far less than the resolution of the NTP timestamp. The
>        measurement uncertainty of the timestamp is not indicated as it
>        may not be known.  On a system that has no notion of wallclock
>        time but does have some system-specific clock such as "system
>        uptime", a sender MAY use that clock as a reference to calculate
>        relative NTP timestamps.  It is important to choose a commonly
>        used clock so that if separate implementations are used to produce
>        the individual streams of a multimedia session, all
>        implementations will use the same clock.  Until the year 2036,
>        relative and absolute timestamps will differ in the high bit so
>        (invalid) comparisons will show a large difference; by then one
>        hopes relative timestamps will no longer be needed.  A sender that
>        has no notion of wallclock or elapsed time MAY set the NTP
>        timestamp to zero.
>     RTP timestamp: 32 bits
>        Corresponds to the same time as the NTP timestamp (above), but in
>        the same units and with the same random offset as the RTP
>        timestamps in data packets.  This correspondence may be used for
>        intra- and inter-media synchronization for sources whose NTP
>        timestamps are synchronized, and may be used by media-independent
>        receivers to estimate the nominal RTP clock frequency.  Note that
>        in most cases this timestamp will not be equal to the RTP
>        timestamp in any adjacent data packet.  Rather, it MUST be
>        calculated from the corresponding NTP timestamp using the
>        relationship between the RTP timestamp counter and real time as
>        maintained by periodically checking the wallclock time at a
>        sampling instant.
> (It is tempting to infer that this means that the RTP timestamp refers
> to the capture time for the media stream - this needs verification.)
> Thus, if we know:
> - The NTP-to-RTP mapping at the remote end
> - The RTP timestamp of the media stream at the moment it is played out
> it follows that the sender can transmit its NTP timestamp as part of a
> data packet, and the recipient can then calculate the time at which the
> "same" instant is played out in the media flow.
> [NOTE: We do NOT know that the NTP timestamp from the remote side
> corresponds to now(). Clocks are often out of sync, and (in really bad
> cases) can have noticeable clock drift.]
> This requires that:
> - The sender has access to the NTP time corresponding to the RTP
> timestamp being put on "the current frame" at recording
> - The recipient has access to the RTP timestamp of "the current frame"
> being played out
> - The recipient has access to the NTP-to-RTP mapping
> The last point can be replaced, with no lack of generality, with giving
> access to the calculated NTP timestamp corresponding to "the current
> frame".
> We could also think of giving access to the RTP timestamps directly, and
> skipping NTP. This would be convenient for a single media stream, and
> loosen the dependency on RTCP sender reports - the downside is that it
> makes it more complex to relate events on multiple media streams.
> Suggestion
> =======
> Add an attribute to MediaStreamTrack called "SenderClock". It is the NTP
> timestamp of the "current frame" being passed to the consumer of this
> track.
> This attribute can be read directly, and is also returned in the
> GetStats function for a track; this allows JS to compute exactly the
> offset between the SenderClock and the system's clock, if desirable.
> For local MediaStreamTracks, the SenderClock is always now() minus some
> constant time (zero?). We model all the delays done at the sender side
> as being part of the PeerConnection, not part of the media source.
> For remote MediaStreamTracks, the SenderClock is the calculated value of
> the NTP time corresponding to the RTP timestamp of the last frame or
> sample rendered. We model all the delays as being part of the
> PeerConnection, not as part of the sink.

Received on Tuesday, 19 February 2013 13:35:31 UTC