[MSE] Establishing the Presentation Start Timestamp

Hi,

While doing some testing with demultiplexed content that uses separate
SourceBuffers for the audio & video streams, we ran into some issues around
establishing the presentation start timestamp that I don't think are
covered well in the existing spec text.

Section 6.1.3<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#webm-start-timestamp>
for
WebM states :
The timestamp in the first block of the first media segment appended
establishes the starting timestamp for the presentation timeline. All media
segments appended after this first segment are expected to have timestamps
greater than or equal to this timestamp.

Section 6.2.3<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#iso-start-timestamp>
has
similar text for ISO.

This language is pretty straightforward if we are only dealing with a
single SourceBuffer. When more than one SourceBuffer is involved things get
a little more tricky when the first media segment for each SourceBuffer
don't start with the same timestamp.

Say I have an audio stream that starts at timestamp 0, and the video stream
starts at 30 milliseconds. If I follow the existing language very strictly,
then whichever stream appends a media segment first establishes the
presentation start time. This means that I can either have a start time of
0 or 30 miliseconds. This raises several questions that I think need to be
discussed.

1. Should we expect the web application to be aware of this situation and
always ensure that the earliest segment gets appended first?

2. Should we wait until the first media segments are appended to all
SourceBuffers in MediaSource.activeSourceBuffers before determining the
start time and then simply take the earliest timestamp?

3. If a media segment is appended that starts before the established
presentation start time and continues past it, how should we handle that?

  - Should this trigger an error?
  - Should it be treated like an end
overlap<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#source-buffer-overlap-end>
where
the presentation start time acts like the end of a range already in the
buffer? This would essentially keep everything after the first random
access point that has a timestamp >= the presentation start timestamp.

4. How close do the starting timestamps on the first media segments from
each SourceBuffer need to be?
  - In this example I've shown them to be only 30 milliseconds apart, but
would 0.5 seconds be acceptable? Would 2 seconds?
  - How much time do we allow here before we consider there to be missing
data and playback can't start?
  - What happens if the gap is too large?

Any insights or suggestions would be greatly appreciated.

Aaron

Received on Thursday, 12 July 2012 17:28:08 UTC