>> * capture time of frame 
> are you thinking SMTPE timecodes (absolute capture time, possibly far in the past for recorded media) or RTP-clock-style "timestamp that can be used for relative positioning in time"?

I have work with SMTPE  timecode a bunch for moving broadcast studios to IP. It seems like a bad match for interactive media. Dropframe handling alone should be enough to convince anyone. I realize broadcast TV needs all that but we don't. 

I would prefer something that is a direct or indirect mapping to NTP time with ms resolution. 

It might seem like something with no leap seconds, like GPS time, was easier to use but I think the operating system end up providing such good support for time with leap seconds that it ends up being easier. 

