W3C home > Mailing lists > Public > public-html-media@w3.org > July 2012

RE: [MSE] Establishing the Presentation Start Timestamp

From: Kevin Streeter <kstreete@adobe.com>
Date: Thu, 12 Jul 2012 11:06:49 -0700
To: Aaron Colwell <acolwell@google.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <0FEA137C08A9DF4781EEF745038C96943BF569CD2C@nambx03.corp.adobe.com>

  This is a tricky one, we encountered a similar situation dealing with unmixed content in Flash.  The problem is really that at startup time, before you have anything in the playback buffer, you have an ambiguous situation when you push the first segment (for the first track).  Will there be another track?  More segments for that single track?  There is no way for the UA to know unless the application tells it.

  One possible way to deal with this is to have a pre-roll state where you can push segments without playback starting.  You would resolve the time issues only after playback actually started.  This would allow the application to initialize all tracks.

  Another way would be to assume that all *initialized* tracks have at least some data.  So if you create a track programmatically, or push in an initialization segment, the UA will wait to start playback until at least some content for that track arrives.

  The first mechanism is more explicit (which IMO is generally better), but would also probably mean changes in the playback state machine.


From: Aaron Colwell [mailto:acolwell@google.com]
Sent: Thursday, July 12, 2012 10:28 AM
To: <public-html-media@w3.org>
Subject: [MSE] Establishing the Presentation Start Timestamp


While doing some testing with demultiplexed content that uses separate SourceBuffers for the audio & video streams, we ran into some issues around establishing the presentation start timestamp that I don't think are covered well in the existing spec text.

Section 6.1.3<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#webm-start-timestamp> for WebM states :
The timestamp in the first block of the first media segment appended establishes the starting timestamp for the presentation timeline. All media segments appended after this first segment are expected to have timestamps greater than or equal to this timestamp.

Section 6.2.3<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#iso-start-timestamp> has similar text for ISO.

This language is pretty straightforward if we are only dealing with a single SourceBuffer. When more than one SourceBuffer is involved things get a little more tricky when the first media segment for each SourceBuffer don't start with the same timestamp.

Say I have an audio stream that starts at timestamp 0, and the video stream starts at 30 milliseconds. If I follow the existing language very strictly, then whichever stream appends a media segment first establishes the presentation start time. This means that I can either have a start time of 0 or 30 miliseconds. This raises several questions that I think need to be discussed.

1. Should we expect the web application to be aware of this situation and always ensure that the earliest segment gets appended first?

2. Should we wait until the first media segments are appended to all SourceBuffers in MediaSource.activeSourceBuffers before determining the start time and then simply take the earliest timestamp?

3. If a media segment is appended that starts before the established presentation start time and continues past it, how should we handle that?
  - Should this trigger an error?
  - Should it be treated like an end overlap<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#source-buffer-overlap-end> where the presentation start time acts like the end of a range already in the buffer? This would essentially keep everything after the first random access point that has a timestamp >= the presentation start timestamp.

4. How close do the starting timestamps on the first media segments from each SourceBuffer need to be?
  - In this example I've shown them to be only 30 milliseconds apart, but would 0.5 seconds be acceptable? Would 2 seconds?
  - How much time do we allow here before we consider there to be missing data and playback can't start?
  - What happens if the gap is too large?

Any insights or suggestions would be greatly appreciated.

Received on Thursday, 12 July 2012 18:08:13 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 15:48:24 UTC