Re: Timing information: Thoughts from Harald Alvestrand on 2013-02-19 (public-webrtc@w3.org from February 2013)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Tue, 19 Feb 2013 20:28:18 +0100
To: public-webrtc@w3.org
Message-ID: <5123D252.1000109@alvestrand.no>
On 02/19/2013 02:35 PM, Stefan Håkansson LK wrote:
> I find this topic interesting, but I would like to get a bit more 
> understanding about your thinking.
>
> If we leave the slide change case aside (the needed precision would 
> probably be met by sending some "change slide" info on a data 
> channel), and look at the "hit the ball in the picture" one, I wonder 
> how the location of the ball is found in the first place?
>
> * If the sender is also a WebRTC browser, I guess that you'd do some 
> processing using a canvas to detect the ball; and you could do that at 
> the receiver end as well.
>
> * If the ball is actually overlaid, I guess you could overlay at the 
> receiving end (and would know where the ball is). On the other hand, 
> if you want the overlays to be in sync with the video, you'd have to 
> have some timing info.
I was thinking of (for example) the case where the ball is synthesized 
into a video frame at the sender, or a real ball being tracked by 
position-sensing hardware at the sender's site - the sender would know 
where the ball is, the receiver would not know (perhaps this is also 
done to avoid too easy cheating).

>
> * If the sender is some special device that can analyze video in 
> real-time to derive timing and coordinates for the ball, while at the 
> same time streaming the video using rtcweb, then I think your proposal 
> could add benefit.
>
> I also think that defining a playout time for "current frame" for a 
> track is kind of weird. A track is (a JS representation of) a sequence 
> of frames, and "current frame" seems to be undefined. And, aren't 
> attributes supposed to be unchanged until a "stable state" is reached? 
> This would mean that several reads of the attribute would give the 
> same time info (even if several frames where played).

HTMLMediaElement's currentTime attribute. I was aiming for something 
that can give you a match to currentTime without anything more complex 
than an addition or subtraction from a known value.

>
> The media element already has some timing info attributes. They on the 
> other hand talk about seconds (which seems to be far to low resolution 
> for this use-case), but perhaps that is the right place to do this 
> kind of things.

currentTime is defined as a double, which gives you ~53 bits of 
floating-point precision.
That's enough to give you pretty fine-grained control, even over long 
videos.

>
> Stefan
>
>
>
> On 2013-02-09 23:13, Harald Alvestrand wrote:
>> This note inspired by, but not in any way dependent on,
>> draft-mandyam-rtcweb-data-synch-00.txt.... it's a long note, those who
>> just want what I propose to actually do can skip to the end.
>>
>> -------------------------------------------------
>>
>> When building interesting applications with data flows, there frequently
>> will occur the need to relate data items to times in the media timeline.
>> This can be as simple as advancing a slide show when the speaker pushes
>> a button, or as complex as putting an interaction overlay over a video
>> and telling the user to "hit the ball in the picture" - you have to know
>> where the ball is on the video in order to know whether it's a hit or 
>> not.
>>
>> The task is made somewhat more complicated by the lack of common clocks
>> across the pieces of the application, and the many sources of delay on
>> the way.
>>
>> A simplified picture:
>>
>> Sender is able to easily refer to a common clock while measuring:
>> * Media source (camera / microphone) delay (constant)
>> * Encoding time (probably roughly constant)
>> * Outgoing network buffering (variable)
>> - measurement point: RTP timestamp when sending (reported in RTCP)
>>
>> * Network delay (variable)
>>
>> Receiver is able to easily refer to a common clock on
>> * Receipt time (measured in metrics)
>> * Jitter buffer time (variable)
>> * Decoding time (roughly constant)
>> * Playout delay (roughly constant)
>>
>> What the receiving application wants is to "know" that data item X
>> refers to the point in time when video frame Y was grabbed and audio
>> sample Z was recorded, so that when video frame Y is painted on the
>> screen or audio sample Z enters the listener's ear, it can do the
>> appropriate thing (whatever that is).
>>
>> The RTP sender report (RFC 3550 section 6.4.1) relates the RTP clock to
>> an NTP timestamp. The description says:
>>
>>     NTP timestamp: 64 bits
>>        Indicates the wallclock time (see Section 4) when this report was
>>        sent so that it may be used in combination with timestamps
>>        returned in reception reports from other receivers to measure
>>        round-trip propagation to those receivers.  Receivers should
>>        expect that the measurement accuracy of the timestamp may be
>>        limited to far less than the resolution of the NTP timestamp. The
>>        measurement uncertainty of the timestamp is not indicated as it
>>        may not be known.  On a system that has no notion of wallclock
>>        time but does have some system-specific clock such as "system
>>        uptime", a sender MAY use that clock as a reference to calculate
>>        relative NTP timestamps.  It is important to choose a commonly
>>        used clock so that if separate implementations are used to 
>> produce
>>        the individual streams of a multimedia session, all
>>        implementations will use the same clock.  Until the year 2036,
>>        relative and absolute timestamps will differ in the high bit so
>>        (invalid) comparisons will show a large difference; by then one
>>        hopes relative timestamps will no longer be needed.  A sender 
>> that
>>        has no notion of wallclock or elapsed time MAY set the NTP
>>        timestamp to zero.
>>
>>     RTP timestamp: 32 bits
>>        Corresponds to the same time as the NTP timestamp (above), but in
>>        the same units and with the same random offset as the RTP
>>        timestamps in data packets.  This correspondence may be used for
>>        intra- and inter-media synchronization for sources whose NTP
>>        timestamps are synchronized, and may be used by media-independent
>>        receivers to estimate the nominal RTP clock frequency. Note that
>>        in most cases this timestamp will not be equal to the RTP
>>        timestamp in any adjacent data packet.  Rather, it MUST be
>>        calculated from the corresponding NTP timestamp using the
>>        relationship between the RTP timestamp counter and real time as
>>        maintained by periodically checking the wallclock time at a
>>        sampling instant.
>>
>> (It is tempting to infer that this means that the RTP timestamp refers
>> to the capture time for the media stream - this needs verification.)
>>
>> Thus, if we know:
>>
>> - The NTP-to-RTP mapping at the remote end
>> - The RTP timestamp of the media stream at the moment it is played out
>>
>> it follows that the sender can transmit its NTP timestamp as part of a
>> data packet, and the recipient can then calculate the time at which the
>> "same" instant is played out in the media flow.
>>
>> [NOTE: We do NOT know that the NTP timestamp from the remote side
>> corresponds to now(). Clocks are often out of sync, and (in really bad
>> cases) can have noticeable clock drift.]
>>
>> This requires that:
>>
>> - The sender has access to the NTP time corresponding to the RTP
>> timestamp being put on "the current frame" at recording
>> - The recipient has access to the RTP timestamp of "the current frame"
>> being played out
>> - The recipient has access to the NTP-to-RTP mapping
>>
>> The last point can be replaced, with no lack of generality, with giving
>> access to the calculated NTP timestamp corresponding to "the current
>> frame".
>>
>> We could also think of giving access to the RTP timestamps directly, and
>> skipping NTP. This would be convenient for a single media stream, and
>> loosen the dependency on RTCP sender reports - the downside is that it
>> makes it more complex to relate events on multiple media streams.
>>
>> Suggestion
>> =======
>> Add an attribute to MediaStreamTrack called "SenderClock". It is the NTP
>> timestamp of the "current frame" being passed to the consumer of this
>> track.
>>
>> This attribute can be read directly, and is also returned in the
>> GetStats function for a track; this allows JS to compute exactly the
>> offset between the SenderClock and the system's clock, if desirable.
>>
>> For local MediaStreamTracks, the SenderClock is always now() minus some
>> constant time (zero?). We model all the delays done at the sender side
>> as being part of the PeerConnection, not part of the media source.
>>
>> For remote MediaStreamTracks, the SenderClock is the calculated value of
>> the NTP time corresponding to the RTP timestamp of the last frame or
>> sample rendered. We model all the delays as being part of the
>> PeerConnection, not as part of the sink.
>>
>>
>>
>>
>>
>>
>
>
Received on Tuesday, 19 February 2013 19:28:49 UTC