[Bug 18400] Define and document timestamp heuristics

https://www.w3.org/Bugs/Public/show_bug.cgi?id=18400

--- Comment #5 from Mark Watson <watsonm@netflix.com> 2012-08-14 21:15:07 UTC ---
(In reply to comment #4)
> comments inline..
> 
> (In reply to comment #2)
> > Proposal inline below:
> > 
> > (In reply to comment #0)
> > > There are several situations where heuristics are needed to resolve issues with
> > > the timestamps in media segments. The following list indicates issues the
> > > Chrome team has encountered so far :
> > > 
> > > 1. How close does the end of one media segment need to be to the beginning of
> > > another to be considered a gapless splice? Media segments can't always align
> > > exactly, especially in adaptive content, and they may be close but don't
> > > overlap.
> > 
> > More generally, if there is a gap in the media data in a Source Buffer, the
> > media element should play continuously across the gap if the duration of the
> > gap is less than 2 (?) video frame intervals or less than 2 (?) audio frame
> > durations. Otherwise the media element should pause and wait for receipt of
> > data.
> 
> [acolwell] Sounds like a reasonable start. How is the "video frame interval"
> and "audio frame duration" determined? Media segments could have different
> frame rates, and codecs like Vorbis have variable audio frame durations (ie
> long & short overlap windows).

I guess it would be fine to say that this is the immediately previous video
frame interval or audio frame duration. It's just a heuristic after all.

> > 
> > > 
> > > 4. How should the UA estimate the duration of a media segment if the last frame
> > > in the segment doesn't have duration information? (ie WebM clusters aren't
> > > required to have an explicit cluster duration. It's possible, but not required
> > > currently)
> > 
> > The rules above enable the UA to determine whether there is a real gap between
> > segments. This obviates the need to know segment duration except for
> > determination of the content duration. The content duration should just be set
> > to the timestamp of the last video frame or the end of the last audio frame,
> > whichever is later.
> 
> [acolwell] This becomes more complicated when overlaps are involved. Without
> knowing the actual duration of segments it becomes tricky to resolve certain
> kinds of overlaps. I'll try to provide an example to illustrate the problem.
> 
> 
> Initial source buffer state.
> +-----------+--+--+----------+
> :A          |A |A |A         |  
> +-----------+--+--+----------+
> 
> A new segment gets appended and we don't know it's duration.
> +--------+-???
> :B       |B     
> +--------+-???  
> 
> Resolve the overlap and assume the end of the segment goes until the next
> frame.
> +--------+--+--+--+----------+
> :B       |B |A |A |A         | 
> +--------+--+--+--+----------+ 
> 
> Append the segment that is supposed to be right after B.
>                +------+------+
>                :C     |C     | 
>                +------+------+ 
> 
> Resolve the overlap.
> +--------+--+--+------+------+
> :B       |B |A :C     |C     | 
> +--------+--+--+------+------+ 
> 
> If B & C had been appended on a clear source buffer you would have gotten this
> which is likely what the application intended.
> +--------+-----+------+------+
> :B       |B    :C     |C     |
> +--------+-----+------+------+
> 
> This is not a hypothetical example. We actually ran into this problem while
> trying to overlap Vorbis data.
> 
> Note that a "wait until the next segment is appended" rule won't help here
> because segments are not required to be appended in order and discontinuous
> appends are not explicitly signalled. 
> 
> Assuming a duration of 1-2 frame intervals can also get you into trouble
> because it may cause a keyframe to get dropped which could result in the loss
> of a whole GOP.

I see your point. In DASH there are detailed rules that streams must conform to
in order to avoid this problem. I don't see any other way to avoid it than to
have such rules around the content itself.

If the example above was video, and the first A that follows B is an I-Frame,
then assuming a later stop time for B would mean that the append of B would
stomp this I-Frame and you would not be able to play back. If the first frame
of some block ( A ...) data strictly follows the last frame of something else (
B ... B) then we can't really do anything other than put all those frames in
the buffer, even if we end up with a very short frame interval.

So, yes, you end up with different outcomes depending on what you do. For
video, provided all the frames are really from the same source material, it
should not be a problem.

> 
> > 
> > > 
> > > 5. How should SourceBuffer.buffered values be merged into a single
> > > HTMLMediaElement.buffered? Simple range intersection? Should heuristic values
> > > like estimated duration (#4) or "close enough" values (#2) be applied before
> > > computing the intersection?
> > 
> > The heuristics of (1) should be used to determine SourceBuffered.buffered. i.e.
> > gaps of less than 2 frame intervals do not result in disjoint intervals in the
> > SourceBuffered.buffered array.
> > 
> > Then the intersection of the SourceBuffered.buffered arrays for the active
> > source buffers appears as the HTMLMediaElement.buffered.
> 
> [acolwell] Ok. Does this also apply after endOfStream() is called? Currently
> Chrome returns the intersection for all ranges when in "open", but uses the
> intersection plus the union of the end ranges if they overlap in "ended". The
> main reason was to handle the case where the streams are slightly different
> lengths. The union on the last overlapping range at least allows buffered to
> reflect playing out to the duration if the streams are farther than 2 intervals
> different.

What you describe sounds right for endOfStream()

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 14 August 2012 21:15:09 UTC