Re: [MSE] Establishing the Presentation Start Timestamp from Aaron Colwell on 2012-07-24 (public-html-media@w3.org from July 2012)

From: Aaron Colwell <acolwell@google.com>
Date: Tue, 24 Jul 2012 16:42:02 -0700
To: Kevin Streeter <kstreete@adobe.com>
Cc: Mark Watson <watsonm@netflix.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bAbxZ8Bgbdz3y7vX=4HrNfWnLsPwvc5SusLa0_Zy4v6Qg@mail.gmail.com>
Using reply all this time... :)

---------- Forwarded message ----------
From: Aaron Colwell <acolwell@google.com>
Date: Tue, Jul 24, 2012 at 4:40 PM
Subject: Re: [MSE] Establishing the Presentation Start Timestamp
To: Kevin Streeter <kstreete@adobe.com>


Hi Kevin,

Yes. The app can either set an offset or just append and then
'videoTag.currentTime = videoTag.buffered.start(0)' which should contain
the start time of the first segment.

Aaron


On Tue, Jul 24, 2012 at 4:37 PM, Kevin Streeter <kstreete@adobe.com> wrote:

> Aaron, Mark,****
>
> ** **
>
>   I’m a little unclear on how things will work for a live stream.  The
> user will typically start playback at some non-zero time, which represents
> the “live” end of the stream.  How does the timestamp offset account for
> this?  Does it requiring setting a negative offset that re-bases the stream
> to 0 so that playback begins immediately?****
>
> ** **
>
> -K ****
>
> ** **
>
> *From:* Mark Watson [mailto:watsonm@netflix.com]
> *Sent:* Tuesday, July 24, 2012 2:19 PM
> *To:* Aaron Colwell
> *Cc:* <public-html-media@w3.org>
> *Subject:* Re: [MSE] Establishing the Presentation Start Timestamp****
>
> ** **
>
> ** **
>
> On Jul 24, 2012, at 2:13 PM, Aaron Colwell wrote:****
>
>
>
> ****
>
> Hi Mark, ****
>
> ** **
>
> Thanks for your comments. I too am starting to believe that we should just
> have the SourceBuffer timelines start at 0 and NOT derive the presentation
> start timestamp from the first segment appended. I agree that the timestamp
> offset mechanism should be used to handle any content that doesn't already
> start at 0. This might make things a little annoying for live streams, but
> it is a one time operation at the beginning of playback to figure out what
> the appropriate timestamp offset needs to be. I think the 90-99% case will
> be with content that starts at 0 so we should optimize for that. If offsets
> need to be applied the app needs to know about them or "discover" them by
> appending a segment to a "scratch" SourceBuffer and see what
> SourceBuffer.buffered reports.****
>
> ** **
>
> Right - and if we need to provide a "cleaner" way for the app to peek the
> media timestamp we can do that later when the need is clearer.****
>
>
>
> ****
>
> ** **
>
> I do have a question about your first append at 10 minutes example. Are
> you saying that you want the HTMLMediaElement to implicitly seek to the the
> start of the first media segment appended? I think it might be less
> surprising if default playback start position<http://dev.w3.org/html5/spec/media-elements.html#default-playback-start-position> stays
> 0 and the app manually sets HTMLMedaElement.currentTime to
> HTMLMediaElement.buffered.start(0) or the desired seek time if you know it.
> ****
>
> ** **
>
> Agreed - that's what I intended (even if that's not what I wrote ;-)****
>
>
>
> ****
>
> Otherwise, I'm not sure how to make the behavior you desire fits into the
> descriptions specified in the offsets into the media resource<http://dev.w3.org/html5/spec/media-elements.html#offsets-into-the-media-resource> section
> of the HTML spec.****
>
> ** **
>
> Aaron****
>
> ** **
>
> On Wed, Jul 18, 2012 at 11:55 AM, Mark Watson <watsonm@netflix.com> wrote:
> ****
>
> ** **
>
> On Jul 18, 2012, at 11:03 AM, Aaron Colwell wrote:****
>
> ** **
>
> Hi Mark,****
>
> ** **
>
> Comments inline... ****
>
> ** **
>
> On Thu, Jul 12, 2012 at 2:32 PM, Mark Watson <watsonm@netflix.com> wrote:*
> ***
>
> ** **
>
> ** **
>
>  ****
>
> 4. How close do the starting timestamps on the first media segments from
> each SourceBuffer need to be? ****
>
>   - In this example I've shown them to be only 30 milliseconds apart, but
> would 0.5 seconds be acceptable? Would 2 seconds? ****
>
>   - How much time do we allow here before we consider there to be missing
> data and playback can't start? ****
>
>   - What happens if the gap is too large?****
>
> ** **
>
> I think this is roughly the same question as 'what happens if I append a
> video segment which starts X ms after the end of the last video segment' ?
> ****
>
> ** **
>
> if X <= one frame interval, this is definitely not a 'gap' and playback
> continues smoothly. If X > 1 second this is definitely a gap and playback
> should stall (in the same way as it does today on a network outage).****
>
> ** **
>
> For X values in between, I am not sure: implementations have to draw a
> line somewhere. A gap of multiple frame intervals could occur when
> switching frame rate. You might also get a couple of frame intervals gap
> when switching if you do wacky things with frame reordering around segment
> boundaries.****
>
> ** **
>
> When looking at differences between audio and video, we need to be
> tolerant of differences as much as the larger of the audio frame size and
> the video frame interval.****
>
> ** **
>
> if the gap is too large, this element just stays in the same state.
> Perhaps I append video from 0s and audio from 2s and this is because my
> network requests got re-ordered and any millisecond now I am going to
> append the 0-2s audio. Playback should start when that 0-2s is appended. *
> ***
>
> ** **
>
> ** **
>
> [acolwell] I agree. We need to come up with some spec text for this and
> then we can then debate the merits of these various magic numbers. Care to
> volunteer for this? :) ****
>
> ** **
>
> Ok, assign me a bug.****
>
>
>
> ****
>
> ** **
>
> Any insights or suggestions would be greatly appreciated.****
>
> ** **
>
> We have the same problem with push/popTimeOffset. Suppose I want your
> media above to appear at offset 200s in both audio and video source
> buffers. What I really want is for the audio to start at 200s and the video
> at 200.030ms.****
>
> ** **
>
> In this case the application knows better than the media what the internal
> media times are. I know that the video segment has all the video from time
> 0s, even though the first frame is at 30ms. I really want to provide the
> actual offset to be applied to the internal timestamps, rather than
> providing the source buffer time that the next segment should start at.***
> *
>
> ** **
>
> [acolwel] One way I think we could get around this is to mandate that the
> media segments actually have a start time of 0. In WebM there is a Cluster
> timestamp and then all blocks are relative to this timestamp. If the
> Cluster timestamp is 0 and the first frame in the cluster is at 30ms then
> there is enough information for the UA to "do the right thing". I'm not
> sure if a similar mechanism exists in ISO.****
>
> ** **
>
> Not really - the rather complex combination of decode times, composition
> offsets and edit lists results in a presentation timestamp for each sample
> on a global timeline (shared across all bitrates etc.). But if the
> timestamp of the first sample is X there is nothing to say, for example,
> "there are no other samples between time Y (< X) and X".****
>
>
>
> ****
>
> The application that creates the demuxed files just need to make sure the
> separate files both have the same segment start time. ****
>
> ** **
>
> That's not always possible because of skew caused by audio frame durations
> being different from video frame intervals.****
>
> ** **
>
> I think we need to say that all source buffers share a common global
> timeline and that timestamps in the media segments must be mapped to that
> in a way that is common across source buffers. This means any offset
> applied to media internal timestamps needs to be the same across source
> buffers. It means that establishing such offsets needs to be done
> explicitly by the application or, if they are derived from timestamps in
> the media it needs to be done in a consistent way (in terms of which out of
> audio and video the time offset is taken from).****
>
> ** **
>
> I think this has implications for the push/pop time offset as well. They
> should be global methods which establish a global offset based on the
> next-appended segment(s).****
>
> ** **
>
> We do also need a way to handle the user starting content in the middle.
> If I have a 30 min content item and the user wants to start at minute 10
> (because of a bookmark, say) then I should be able to start appending data
> at position 10min in the source buffer timeline. The seek bar needs to show
> the playback starting at minute 10 and if the user seeks backwards this
> should be ok.****
>
> ** **
>
> pushOffset isn't right for this case because the media internal timestamps
> are correct: the first segment appended really does start at timestamp
> 10min.****
>
> ** **
>
> I wonder whether we should just say that the source buffer timestamp
> starts at zero and not derive a start point from the appended media. If the
> media internal timestamp corresponding to the start of the content is not
> zero you need to explicitly handle this with a pushOffset call ?****
>
>
>
> ****
>
> ** **
>
> Applications could also just append the segments to a "scratch"
> SourceBuffer to see what the initial timestamp is and then use that
> information to compute the proper offset to apply. It's not the greatest
> solution, but it does provide a way for people to handle this if they
> aren't as careful about how they create their demuxed content.****
>
>  ****
>
> ** **
>
> Aaron****
>
> ** **
>
> ** **
>
> Hmm - no clear answer here - I'll think about this some more.****
>
> ** **
>
> …Mark****
>
>
>
> ****
>
> ** **
>
> Aaron****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
Received on Tuesday, 24 July 2012 23:42:32 UTC