- From: Harald Alvestrand <harald@alvestrand.no>
- Date: Tue, 19 Feb 2013 20:28:18 +0100
- To: public-webrtc@w3.org
On 02/19/2013 02:35 PM, Stefan Håkansson LK wrote: > I find this topic interesting, but I would like to get a bit more > understanding about your thinking. > > If we leave the slide change case aside (the needed precision would > probably be met by sending some "change slide" info on a data > channel), and look at the "hit the ball in the picture" one, I wonder > how the location of the ball is found in the first place? > > * If the sender is also a WebRTC browser, I guess that you'd do some > processing using a canvas to detect the ball; and you could do that at > the receiver end as well. > > * If the ball is actually overlaid, I guess you could overlay at the > receiving end (and would know where the ball is). On the other hand, > if you want the overlays to be in sync with the video, you'd have to > have some timing info. I was thinking of (for example) the case where the ball is synthesized into a video frame at the sender, or a real ball being tracked by position-sensing hardware at the sender's site - the sender would know where the ball is, the receiver would not know (perhaps this is also done to avoid too easy cheating). > > * If the sender is some special device that can analyze video in > real-time to derive timing and coordinates for the ball, while at the > same time streaming the video using rtcweb, then I think your proposal > could add benefit. > > I also think that defining a playout time for "current frame" for a > track is kind of weird. A track is (a JS representation of) a sequence > of frames, and "current frame" seems to be undefined. And, aren't > attributes supposed to be unchanged until a "stable state" is reached? > This would mean that several reads of the attribute would give the > same time info (even if several frames where played). HTMLMediaElement's currentTime attribute. I was aiming for something that can give you a match to currentTime without anything more complex than an addition or subtraction from a known value. > > The media element already has some timing info attributes. They on the > other hand talk about seconds (which seems to be far to low resolution > for this use-case), but perhaps that is the right place to do this > kind of things. currentTime is defined as a double, which gives you ~53 bits of floating-point precision. That's enough to give you pretty fine-grained control, even over long videos. > > Stefan > > > > On 2013-02-09 23:13, Harald Alvestrand wrote: >> This note inspired by, but not in any way dependent on, >> draft-mandyam-rtcweb-data-synch-00.txt.... it's a long note, those who >> just want what I propose to actually do can skip to the end. >> >> ------------------------------------------------- >> >> When building interesting applications with data flows, there frequently >> will occur the need to relate data items to times in the media timeline. >> This can be as simple as advancing a slide show when the speaker pushes >> a button, or as complex as putting an interaction overlay over a video >> and telling the user to "hit the ball in the picture" - you have to know >> where the ball is on the video in order to know whether it's a hit or >> not. >> >> The task is made somewhat more complicated by the lack of common clocks >> across the pieces of the application, and the many sources of delay on >> the way. >> >> A simplified picture: >> >> Sender is able to easily refer to a common clock while measuring: >> * Media source (camera / microphone) delay (constant) >> * Encoding time (probably roughly constant) >> * Outgoing network buffering (variable) >> - measurement point: RTP timestamp when sending (reported in RTCP) >> >> * Network delay (variable) >> >> Receiver is able to easily refer to a common clock on >> * Receipt time (measured in metrics) >> * Jitter buffer time (variable) >> * Decoding time (roughly constant) >> * Playout delay (roughly constant) >> >> What the receiving application wants is to "know" that data item X >> refers to the point in time when video frame Y was grabbed and audio >> sample Z was recorded, so that when video frame Y is painted on the >> screen or audio sample Z enters the listener's ear, it can do the >> appropriate thing (whatever that is). >> >> The RTP sender report (RFC 3550 section 6.4.1) relates the RTP clock to >> an NTP timestamp. The description says: >> >> NTP timestamp: 64 bits >> Indicates the wallclock time (see Section 4) when this report was >> sent so that it may be used in combination with timestamps >> returned in reception reports from other receivers to measure >> round-trip propagation to those receivers. Receivers should >> expect that the measurement accuracy of the timestamp may be >> limited to far less than the resolution of the NTP timestamp. The >> measurement uncertainty of the timestamp is not indicated as it >> may not be known. On a system that has no notion of wallclock >> time but does have some system-specific clock such as "system >> uptime", a sender MAY use that clock as a reference to calculate >> relative NTP timestamps. It is important to choose a commonly >> used clock so that if separate implementations are used to >> produce >> the individual streams of a multimedia session, all >> implementations will use the same clock. Until the year 2036, >> relative and absolute timestamps will differ in the high bit so >> (invalid) comparisons will show a large difference; by then one >> hopes relative timestamps will no longer be needed. A sender >> that >> has no notion of wallclock or elapsed time MAY set the NTP >> timestamp to zero. >> >> RTP timestamp: 32 bits >> Corresponds to the same time as the NTP timestamp (above), but in >> the same units and with the same random offset as the RTP >> timestamps in data packets. This correspondence may be used for >> intra- and inter-media synchronization for sources whose NTP >> timestamps are synchronized, and may be used by media-independent >> receivers to estimate the nominal RTP clock frequency. Note that >> in most cases this timestamp will not be equal to the RTP >> timestamp in any adjacent data packet. Rather, it MUST be >> calculated from the corresponding NTP timestamp using the >> relationship between the RTP timestamp counter and real time as >> maintained by periodically checking the wallclock time at a >> sampling instant. >> >> (It is tempting to infer that this means that the RTP timestamp refers >> to the capture time for the media stream - this needs verification.) >> >> Thus, if we know: >> >> - The NTP-to-RTP mapping at the remote end >> - The RTP timestamp of the media stream at the moment it is played out >> >> it follows that the sender can transmit its NTP timestamp as part of a >> data packet, and the recipient can then calculate the time at which the >> "same" instant is played out in the media flow. >> >> [NOTE: We do NOT know that the NTP timestamp from the remote side >> corresponds to now(). Clocks are often out of sync, and (in really bad >> cases) can have noticeable clock drift.] >> >> This requires that: >> >> - The sender has access to the NTP time corresponding to the RTP >> timestamp being put on "the current frame" at recording >> - The recipient has access to the RTP timestamp of "the current frame" >> being played out >> - The recipient has access to the NTP-to-RTP mapping >> >> The last point can be replaced, with no lack of generality, with giving >> access to the calculated NTP timestamp corresponding to "the current >> frame". >> >> We could also think of giving access to the RTP timestamps directly, and >> skipping NTP. This would be convenient for a single media stream, and >> loosen the dependency on RTCP sender reports - the downside is that it >> makes it more complex to relate events on multiple media streams. >> >> Suggestion >> ======= >> Add an attribute to MediaStreamTrack called "SenderClock". It is the NTP >> timestamp of the "current frame" being passed to the consumer of this >> track. >> >> This attribute can be read directly, and is also returned in the >> GetStats function for a track; this allows JS to compute exactly the >> offset between the SenderClock and the system's clock, if desirable. >> >> For local MediaStreamTracks, the SenderClock is always now() minus some >> constant time (zero?). We model all the delays done at the sender side >> as being part of the PeerConnection, not part of the media source. >> >> For remote MediaStreamTracks, the SenderClock is the calculated value of >> the NTP time corresponding to the RTP timestamp of the last frame or >> sample rendered. We model all the delays as being part of the >> PeerConnection, not as part of the sink. >> >> >> >> >> >> > >
Received on Tuesday, 19 February 2013 19:28:49 UTC