- From: Mark Watson <watsonm@netflix.com>
- Date: Wed, 25 Jul 2012 04:38:00 +0000
- To: Steven Robertson <strobe@google.com>
- CC: Aaron Colwell <acolwell@google.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
- Message-ID: <44937797-82F8-4FA9-9075-8CE77786727E@netflix.com>
On Jul 24, 2012, at 5:07 PM, Steven Robertson wrote: On Tue, Jul 24, 2012 at 2:13 PM, Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>> wrote: I think the 90-99% case will be with content that starts at 0 so we should optimize for that. IANA BMFF expert, but here's my experience with this so far: Many demultiplexed BMFF video streams (all that I've seen, at least) using B-frames will have an initial presentation timestamp of "max_bframe_pyramid_depth*sample_duration", since some muxers avoid signed values in CTS offsets. Multiplexed files may have non-zero initial presentation timestamps for both audio and video, to keep PTS values in sync across tracks. In non-fragmented BMFF files, the need to remove such an offset is indicated by the edit box, but in DASH/MSE BMFF files, the convention seems to be to specify a sidx earliest_presentation_time on the first media segment which includes the offset, and a tfdt on the first media segment with a decode_time of 0, letting the difference serve as the offset to be used for synchronization. This has been addressed in MPEG: you are supposed to include an Edit List for fragmented files as well. DASH requires that all media timelines across different bitrates etc. are aligned and begin at zero. The per-track offset is thus recoverable by comparing the PTS time of the first media sample in a track to its base decode timestamp, or equivalently by using the sidx earliest_presentation_time against the base decode timestamp. I'm not sure that's quite correct. Decode Time is a somewhat arbitrary ISO file format concept which is only relevant insofar as it is the starting point for calculating the presentation time: what's important is the Presentation Time of each sample which is obtained from the Composition Time (= Decode Time + Composition Offsets) by applying the Edit List. Surprisingly, this doesn't seem to be defined in any spec or even described anywhere, absent the use of sidx and friends to link multiple files to one presentation timeline (although I could be wrong). DASH does require all the media of a presentation to start at zero. Edit Lists are the only way to achieve that in ISO BMFF (btw, when I say 'start from zero' that doesn't mean the first sample is at time zero. It may be that the audio begins at zero, for example, but the first video frame is as 33ms, say.) If this offset-elimination behavior isn't defined anywhere, I suggest that we add appropriate language to the format-specific part of the spec, specifying that, for each media segment, the presentation timestamps of all samples within a track will be adjusted such that the earliest presentation timestamp of that track is equal to the base media decode time of that track. That would contradict the ISO spec, which says the Presentation Time of each sample is obtained from the Composition Time (= Decode Time + Composition Offsets) by applying the Edit List. AFAICT, this will simply "do the right thing", and will make the statement above (that 99% of content starts at t=0) true. If this timestamp correction algorithm is defined somewhere, please let me know where ;) On a related note, I have not seen an edit box present in any DASH/MSE BMFF media, since its most common use (compensating for this offset) is met by the simple algorithm above. If this is true, should references to an edit box be removed from the MSE spec? Which DASH BMFF media are you referring to ? Thanks, Steve
Received on Wednesday, 25 July 2012 04:38:29 UTC