Re: [MSE] Establishing the Presentation Start Timestamp from Aaron Colwell on 2012-08-08 (public-html-media@w3.org from August 2012)

From: Aaron Colwell <acolwell@google.com>
Date: Wed, 8 Aug 2012 14:07:17 -0700
To: Kevin Streeter <kstreete@adobe.com>
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bCwFoUO1i23jDi0nhid=jWqMdsBct76r2naqVxz3Vwuuw@mail.gmail.com>
Hi Kevin,

I just wanted to circle back with you to make sure my answer was
satisfactory. If so then I'll start working on updating the spec so I can
resolve Bug 18389 <https://www.w3.org/Bugs/Public/show_bug.cgi?id=18389>.

Aaron


On Tue, Jul 24, 2012 at 4:42 PM, Aaron Colwell <acolwell@google.com> wrote:

> Using reply all this time... :)
>
> ---------- Forwarded message ----------
> From: Aaron Colwell <acolwell@google.com>
> Date: Tue, Jul 24, 2012 at 4:40 PM
> Subject: Re: [MSE] Establishing the Presentation Start Timestamp
> To: Kevin Streeter <kstreete@adobe.com>
>
>
> Hi Kevin,
>
> Yes. The app can either set an offset or just append and then
> 'videoTag.currentTime = videoTag.buffered.start(0)' which should contain
> the start time of the first segment.
>
> Aaron
>
>
> On Tue, Jul 24, 2012 at 4:37 PM, Kevin Streeter <kstreete@adobe.com>wrote:
>
>> Aaron, Mark,****
>>
>> ** **
>>
>>   I’m a little unclear on how things will work for a live stream.  The
>> user will typically start playback at some non-zero time, which represents
>> the “live” end of the stream.  How does the timestamp offset account for
>> this?  Does it requiring setting a negative offset that re-bases the stream
>> to 0 so that playback begins immediately?****
>>
>> ** **
>>
>> -K ****
>>
>> ** **
>>
>> *From:* Mark Watson [mailto:watsonm@netflix.com]
>> *Sent:* Tuesday, July 24, 2012 2:19 PM
>> *To:* Aaron Colwell
>> *Cc:* <public-html-media@w3.org>
>> *Subject:* Re: [MSE] Establishing the Presentation Start Timestamp****
>>
>> ** **
>>
>> ** **
>>
>> On Jul 24, 2012, at 2:13 PM, Aaron Colwell wrote:****
>>
>>
>>
>> ****
>>
>> Hi Mark, ****
>>
>> ** **
>>
>> Thanks for your comments. I too am starting to believe that we should
>> just have the SourceBuffer timelines start at 0 and NOT derive the
>> presentation start timestamp from the first segment appended. I agree that
>> the timestamp offset mechanism should be used to handle any content that
>> doesn't already start at 0. This might make things a little annoying for
>> live streams, but it is a one time operation at the beginning of playback
>> to figure out what the appropriate timestamp offset needs to be. I think
>> the 90-99% case will be with content that starts at 0 so we should optimize
>> for that. If offsets need to be applied the app needs to know about them or
>> "discover" them by appending a segment to a "scratch" SourceBuffer and see
>> what SourceBuffer.buffered reports.****
>>
>> ** **
>>
>> Right - and if we need to provide a "cleaner" way for the app to peek the
>> media timestamp we can do that later when the need is clearer.****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> I do have a question about your first append at 10 minutes example. Are
>> you saying that you want the HTMLMediaElement to implicitly seek to the the
>> start of the first media segment appended? I think it might be less
>> surprising if default playback start position<http://dev.w3.org/html5/spec/media-elements.html#default-playback-start-position> stays
>> 0 and the app manually sets HTMLMedaElement.currentTime to
>> HTMLMediaElement.buffered.start(0) or the desired seek time if you know it.
>> ****
>>
>> ** **
>>
>> Agreed - that's what I intended (even if that's not what I wrote ;-)****
>>
>>
>>
>> ****
>>
>> Otherwise, I'm not sure how to make the behavior you desire fits into the
>> descriptions specified in the offsets into the media resource<http://dev.w3.org/html5/spec/media-elements.html#offsets-into-the-media-resource> section
>> of the HTML spec.****
>>
>> ** **
>>
>> Aaron****
>>
>> ** **
>>
>> On Wed, Jul 18, 2012 at 11:55 AM, Mark Watson <watsonm@netflix.com>
>> wrote:****
>>
>> ** **
>>
>> On Jul 18, 2012, at 11:03 AM, Aaron Colwell wrote:****
>>
>> ** **
>>
>> Hi Mark,****
>>
>> ** **
>>
>> Comments inline... ****
>>
>> ** **
>>
>> On Thu, Jul 12, 2012 at 2:32 PM, Mark Watson <watsonm@netflix.com> wrote:
>> ****
>>
>> ** **
>>
>> ** **
>>
>>  ****
>>
>> 4. How close do the starting timestamps on the first media segments from
>> each SourceBuffer need to be? ****
>>
>>   - In this example I've shown them to be only 30 milliseconds apart, but
>> would 0.5 seconds be acceptable? Would 2 seconds? ****
>>
>>   - How much time do we allow here before we consider there to be missing
>> data and playback can't start? ****
>>
>>   - What happens if the gap is too large?****
>>
>> ** **
>>
>> I think this is roughly the same question as 'what happens if I append a
>> video segment which starts X ms after the end of the last video segment' ?
>> ****
>>
>> ** **
>>
>> if X <= one frame interval, this is definitely not a 'gap' and playback
>> continues smoothly. If X > 1 second this is definitely a gap and playback
>> should stall (in the same way as it does today on a network outage).****
>>
>> ** **
>>
>> For X values in between, I am not sure: implementations have to draw a
>> line somewhere. A gap of multiple frame intervals could occur when
>> switching frame rate. You might also get a couple of frame intervals gap
>> when switching if you do wacky things with frame reordering around segment
>> boundaries.****
>>
>> ** **
>>
>> When looking at differences between audio and video, we need to be
>> tolerant of differences as much as the larger of the audio frame size and
>> the video frame interval.****
>>
>> ** **
>>
>> if the gap is too large, this element just stays in the same state.
>> Perhaps I append video from 0s and audio from 2s and this is because my
>> network requests got re-ordered and any millisecond now I am going to
>> append the 0-2s audio. Playback should start when that 0-2s is appended.
>> ****
>>
>> ** **
>>
>> ** **
>>
>> [acolwell] I agree. We need to come up with some spec text for this and
>> then we can then debate the merits of these various magic numbers. Care to
>> volunteer for this? :) ****
>>
>> ** **
>>
>> Ok, assign me a bug.****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> Any insights or suggestions would be greatly appreciated.****
>>
>> ** **
>>
>> We have the same problem with push/popTimeOffset. Suppose I want your
>> media above to appear at offset 200s in both audio and video source
>> buffers. What I really want is for the audio to start at 200s and the video
>> at 200.030ms.****
>>
>> ** **
>>
>> In this case the application knows better than the media what the
>> internal media times are. I know that the video segment has all the video
>> from time 0s, even though the first frame is at 30ms. I really want to
>> provide the actual offset to be applied to the internal timestamps, rather
>> than providing the source buffer time that the next segment should start at.
>> ****
>>
>> ** **
>>
>> [acolwel] One way I think we could get around this is to mandate that the
>> media segments actually have a start time of 0. In WebM there is a Cluster
>> timestamp and then all blocks are relative to this timestamp. If the
>> Cluster timestamp is 0 and the first frame in the cluster is at 30ms then
>> there is enough information for the UA to "do the right thing". I'm not
>> sure if a similar mechanism exists in ISO.****
>>
>> ** **
>>
>> Not really - the rather complex combination of decode times, composition
>> offsets and edit lists results in a presentation timestamp for each sample
>> on a global timeline (shared across all bitrates etc.). But if the
>> timestamp of the first sample is X there is nothing to say, for example,
>> "there are no other samples between time Y (< X) and X".****
>>
>>
>>
>> ****
>>
>> The application that creates the demuxed files just need to make sure the
>> separate files both have the same segment start time. ****
>>
>> ** **
>>
>> That's not always possible because of skew caused by audio frame
>> durations being different from video frame intervals.****
>>
>> ** **
>>
>> I think we need to say that all source buffers share a common global
>> timeline and that timestamps in the media segments must be mapped to that
>> in a way that is common across source buffers. This means any offset
>> applied to media internal timestamps needs to be the same across source
>> buffers. It means that establishing such offsets needs to be done
>> explicitly by the application or, if they are derived from timestamps in
>> the media it needs to be done in a consistent way (in terms of which out of
>> audio and video the time offset is taken from).****
>>
>> ** **
>>
>> I think this has implications for the push/pop time offset as well. They
>> should be global methods which establish a global offset based on the
>> next-appended segment(s).****
>>
>> ** **
>>
>> We do also need a way to handle the user starting content in the middle.
>> If I have a 30 min content item and the user wants to start at minute 10
>> (because of a bookmark, say) then I should be able to start appending data
>> at position 10min in the source buffer timeline. The seek bar needs to show
>> the playback starting at minute 10 and if the user seeks backwards this
>> should be ok.****
>>
>> ** **
>>
>> pushOffset isn't right for this case because the media internal
>> timestamps are correct: the first segment appended really does start at
>> timestamp 10min.****
>>
>> ** **
>>
>> I wonder whether we should just say that the source buffer timestamp
>> starts at zero and not derive a start point from the appended media. If the
>> media internal timestamp corresponding to the start of the content is not
>> zero you need to explicitly handle this with a pushOffset call ?****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> Applications could also just append the segments to a "scratch"
>> SourceBuffer to see what the initial timestamp is and then use that
>> information to compute the proper offset to apply. It's not the greatest
>> solution, but it does provide a way for people to handle this if they
>> aren't as careful about how they create their demuxed content.****
>>
>>  ****
>>
>> ** **
>>
>> Aaron****
>>
>> ** **
>>
>> ** **
>>
>> Hmm - no clear answer here - I'll think about this some more.****
>>
>> ** **
>>
>> …Mark****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> Aaron****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>
>
Received on Wednesday, 8 August 2012 21:07:46 UTC