- From: Steven Robertson <strobe@google.com>
- Date: Tue, 24 Jul 2012 17:07:55 -0700
- To: Aaron Colwell <acolwell@google.com>
- Cc: Mark Watson <watsonm@netflix.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
- Message-ID: <CAJtuSCtS+OtzDW3jCR=_NcTFtAhnaFtbw8W0n3UxZUFzczhESA@mail.gmail.com>
On Tue, Jul 24, 2012 at 2:13 PM, Aaron Colwell <acolwell@google.com> wrote: > I think the 90-99% case will be with content that starts at 0 so we should > optimize for that. > IANA BMFF expert, but here's my experience with this so far: Many demultiplexed BMFF video streams (all that I've seen, at least) using B-frames will have an initial presentation timestamp of "max_bframe_pyramid_depth*sample_duration", since some muxers avoid signed values in CTS offsets. Multiplexed files may have non-zero initial presentation timestamps for both audio and video, to keep PTS values in sync across tracks. In non-fragmented BMFF files, the need to remove such an offset is indicated by the edit box, but in DASH/MSE BMFF files, the convention seems to be to specify a sidx earliest_presentation_time on the first media segment which includes the offset, and a tfdt on the first media segment with a decode_time of 0, letting the difference serve as the offset to be used for synchronization. The per-track offset is thus recoverable by comparing the PTS time of the first media sample in a track to its base decode timestamp, or equivalently by using the sidx earliest_presentation_time against the base decode timestamp. Surprisingly, this doesn't seem to be defined in any spec or even described anywhere, absent the use of sidx and friends to link multiple files to one presentation timeline (although I could be wrong). If this offset-elimination behavior isn't defined anywhere, I suggest that we add appropriate language to the format-specific part of the spec, specifying that, for each media segment, the presentation timestamps of all samples within a track will be adjusted such that the earliest presentation timestamp of that track is equal to the base media decode time of that track. AFAICT, this will simply "do the right thing", and will make the statement above (that 99% of content starts at t=0) true. If this timestamp correction algorithm is defined somewhere, please let me know where ;) On a related note, I have not seen an edit box present in any DASH/MSE BMFF media, since its most common use (compensating for this offset) is met by the simple algorithm above. If this is true, should references to an edit box be removed from the MSE spec? Thanks, Steve
Received on Wednesday, 25 July 2012 00:09:04 UTC