Re: [MSE] Establishing the Presentation Start Timestamp from Aaron Colwell on 2012-07-18 (public-html-media@w3.org from July 2012)

From: Aaron Colwell <acolwell@google.com>
Date: Wed, 18 Jul 2012 11:03:58 -0700
To: Mark Watson <watsonm@netflix.com>
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bAqD+cFWmxw4bT=6mWvAZrfqi-5qXqALuBW0qgi-ZgvWw@mail.gmail.com>
Hi Mark,

Comments inline...


On Thu, Jul 12, 2012 at 2:32 PM, Mark Watson <watsonm@netflix.com> wrote:

>
>  On Jul 12, 2012, at 10:27 AM, Aaron Colwell wrote:
>
> 1. Should we expect the web application to be aware of this situation and
> always ensure that the earliest segment gets appended first?
>
>
>  No. We should require that all tracks share the same global timeline. In
> this case the above means that the audio should start and there should be
> 30ms of blank screen before the first video frame is displayed. Metadata
> available to the JS app probably says both segments start as zero (both
> contain all the media from time zero onwards). So the JS is unlikely to
> know which to append first.
>

[acolwell] Agreed.


> 2. Should we wait until the first media segments are appended to all
> SourceBuffers in MediaSource.activeSourceBuffers before determining the
> start time and then simply take the earliest timestamp?
>
>
>  I forget how 'activeSourceBuffers' works exactly. Is it possible the app
> wants to set up separate SourceBuffers for the English and French audio
> tracks but only the French is enabled and only media for the French is
> being appended.
>

[acolwell] activeSourceBuffers contains the currently selected/enabled
tracks. SourceBuffers are placed in this list if they are the first track
of that type (ie first audio, first video) or if a specific track is
selected. The former can only be determined once an initialization segment
has been appended. The idea is to force the init segments to get appended
for all SourceBuffers we want to be considered in the start time
computation before appending any media segments. This would implicitly
indicate which SourceBuffers we needed to wait on for media data before
making a start time determination. I believe this roughly maps to the
second option Kevin proposed earlier in this thread.

I believe this would still work if you had a separate SourceBuffer for
French & English. The French SourceBuffer would be the only one in
activeSourceBuffers since it is the only one that is enabled.


> 3. If a media segment is appended that starts before the established
> presentation start time and continues past it, how should we handle that?
>
>   - Should this trigger an error?
>   - Should it be treated like an end overlap<http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#source-buffer-overlap-end> where
> the presentation start time acts like the end of a range already in the
> buffer? This would essentially keep everything after the first random
> access point that has a timestamp >= the presentation start timestamp.
>
>
>  It seems to me like this should be an error, because I can't think of a
> use-case were this wouldn't be a mistake on the part of the application.
>

[acolwell] Ok sounds reasonable to me. In your French & English
SourceBuffer example above, I'm assuming that both would have the same
timestamp for their first media segment. If not, then switching to English
and seeking back to the start could result in an error.


> 4. How close do the starting timestamps on the first media segments from
> each SourceBuffer need to be?
>   - In this example I've shown them to be only 30 milliseconds apart, but
> would 0.5 seconds be acceptable? Would 2 seconds?
>   - How much time do we allow here before we consider there to be missing
> data and playback can't start?
>   - What happens if the gap is too large?
>
>
>  I think this is roughly the same question as 'what happens if I append a
> video segment which starts X ms after the end of the last video segment' ?
>
>  if X <= one frame interval, this is definitely not a 'gap' and playback
> continues smoothly. If X > 1 second this is definitely a gap and playback
> should stall (in the same way as it does today on a network outage).
>
>  For X values in between, I am not sure: implementations have to draw a
> line somewhere. A gap of multiple frame intervals could occur when
> switching frame rate. You might also get a couple of frame intervals gap
> when switching if you do wacky things with frame reordering around segment
> boundaries.
>
>  When looking at differences between audio and video, we need to be
> tolerant of differences as much as the larger of the audio frame size and
> the video frame interval.
>
>  if the gap is too large, this element just stays in the same state.
> Perhaps I append video from 0s and audio from 2s and this is because my
> network requests got re-ordered and any millisecond now I am going to
> append the 0-2s audio. Playback should start when that 0-2s is appended.
>
>
[acolwell] I agree. We need to come up with some spec text for this and
then we can then debate the merits of these various magic numbers. Care to
volunteer for this? :)

>
>  Any insights or suggestions would be greatly appreciated.
>
>
>  We have the same problem with push/popTimeOffset. Suppose I want your
> media above to appear at offset 200s in both audio and video source
> buffers. What I really want is for the audio to start at 200s and the video
> at 200.030ms.
>
>  In this case the application knows better than the media what the
> internal media times are. I know that the video segment has all the video
> from time 0s, even though the first frame is at 30ms. I really want to
> provide the actual offset to be applied to the internal timestamps, rather
> than providing the source buffer time that the next segment should start at.
>

[acolwel] One way I think we could get around this is to mandate that the
media segments actually have a start time of 0. In WebM there is a Cluster
timestamp and then all blocks are relative to this timestamp. If the
Cluster timestamp is 0 and the first frame in the cluster is at 30ms then
there is enough information for the UA to "do the right thing". I'm not
sure if a similar mechanism exists in ISO. The application that creates the
demuxed files just need to make sure the separate files both have the same
segment start time.

Applications could also just append the segments to a "scratch"
SourceBuffer to see what the initial timestamp is and then use that
information to compute the proper offset to apply. It's not the greatest
solution, but it does provide a way for people to handle this if they
aren't as careful about how they create their demuxed content.


Aaron


>  Hmm - no clear answer here - I'll think about this some more.
>
>  …Mark
>
>
>  Aaron
>
>
>
Received on Wednesday, 18 July 2012 18:04:29 UTC