W3C home > Mailing lists > Public > public-html-media@w3.org > January 2013

Re: [MSE] Bug 18615 - HTMLMediaElement.buffered behavior in "ended" state

From: Aaron Colwell <acolwell@google.com>
Date: Wed, 16 Jan 2013 08:52:08 -0800
Message-ID: <CAA0c1bDGns6h65-HU5apYnK_z+Q3-vcdjp89GhOZJ_pdv4yFqw@mail.gmail.com>
To: Mark Watson <watsonm@netflix.com>
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Hi Mark,

Thanks for your response. I agree with most of what you've said here. As I
was reading it though I realized that I actually strayed from what Bug
18615<https://www.w3.org/Bugs/Public/show_bug.cgi?id=18615> is
actually about. I think moving endOfStream() from MediaSource to
SourceBuffer should probably be filed as a different bug and addressed
seperately. If we go down that path then we'd have to figure out how to
deal w/ the error reporting use cases of endOfStream(). If you think this
is worth pursuing, could you file a bug & proposed solution for this please?

Back to Bug 18615 <https://www.w3.org/Bugs/Public/show_bug.cgi?id=18615>.
Sorry for leading the discussion away from the problem in the bug. The
issue is how to collapse the ranges in multiple SourceBuffer.buffered
attributes into a single set of ranges for the HTMLMediaElement.buffered
attribute. I'll try to describe the current problem that Philip's example
exposes by going through the algorithm in the

mediaSource.activeSourceBuffers[0].buffered returns  [0,10] and [20,30]
mediaSource.activeSourceBuffers[1].buffered returns [0, 15]

Step 1. *active ranges* equals [0,10], [20,30], [0, 15]
Step 2. *intersection range* equals [0, 10]
Step 3.1 *highest end time* equals 30
Step 3.2 *highest intersection end time* equals 10
Step 3.3 *intersection range* equals [0, 30]
Step 4 return *intersection range*

I believe it makes more sense for the attribute to return [0, 10] [20,30]
since this reflects what is actually playable in this state. Do people

If so then I think the following updated algorithm will fix this bug.

   1. If activeSourceBuffers<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#widl-MediaSource-activeSourceBuffers>.length
   equals 0 then return an empty
   and abort these steps.
   2. Let active ranges be the ranges returned by
   each SourceBuffer<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#idl-def-SourceBuffer>
   in activeSourceBuffers<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#widl-MediaSource-activeSourceBuffers>
   3. Let highest end time be the largest range end time in the active
   4. Let intersection ranges equal a
   0 to highest end time.
   5. For each SourceBuffer<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#idl-def-SourceBuffer>
   in activeSourceBuffers<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#widl-MediaSource-activeSourceBuffers>
   the following steps:
      1. Let source ranges equal the ranges returned by the
      on the current range.
      2. If readyState<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#widl-MediaSource-readyState>
       is "ended"<http://www.corp.google.com/~acolwell/no_crawl/html-media/media-source/media-source-respec.html#idl-def-ReadyState>,
      then set the end time on the last range in source ranges to highest
      end time.
      3. Let new intersection ranges equal the the intersection
between the intersection
      ranges and the source ranges.
      4. Replace the ranges in intersection ranges with the new
      intersection ranges.
   6. Return the intersection ranges.

Here is the run through for the new algorithm.
Step 1. Continue since we have 2 active SourceBuffers
Step 2. *active ranges* equals [0,10], [20,30], [0, 15]
Step 3. *highest end time* equals 30.
Step 4. *intersection ranges* equals [0,30]
Step 5.1 *source ranges* equals [0,10], [20,30]
Step 5.2 *source ranges* equals [0,10], [20,30]
Step 5.3 *new intersection ranges* equals [0,10], [20,30]
Step 5.4 *intersection ranges* equals [0,10], [20,30]
Step 5.1 *source ranges* equals [0,15]
Step 5.2 *source ranges* equals [0,30]
Step 5.3 *new intersection ranges* equals [0,10], [20,30]
Step 5.4 *intersection ranges* equals [0,10], [20,30]
Step 6 Return [0,10], [20,30].

As you can see this appears to fix the case Philip raised. Can anyone else
think of a situation where this algorithm won't work properly?

Unless there is an objection, I'll update the spec with this algorithm for
now and if we find another problem we can always reopen the bug.


On Tue, Jan 15, 2013 at 8:59 AM, Mark Watson <watsonm@netflix.com> wrote:

>  As mentioned on the call, if we think of endOfStream() as placing a
> marker in each source buffer, indicating the time at which each stream
> ends, then I think we get consistent answers to the questions below.
>  With a global endOfStream() call, the marker is placed at the end of the
> frame with the latest timestamp in each source buffer.
>  Calling endOfStream() would not mean that no more data will be appended,
> it means that no more data with a later timestamp than the latest existing
> timestamp will be appended.
>  On Jan 10, 2013, at 10:02 AM, Aaron Colwell wrote:
>  Hi,
>  I'd like to restart the discussion around Bug 18615<https://www.w3.org/Bugs/Public/show_bug.cgi?id=18615> so
> we can get closer to closing it out. The gist of the problem is how should
> the buffered property and playback behavior change when the MediaSourceenters the ended state.
>  For well behaved applications that append multiplexed data the behavior
> is relatively straight forward and I think the existing spec text works
> fine. The problems appear when audio & video are appended with different
> SourceBuffers and buffered regions of each SourceBuffer may be disjoint
> in several regions when endOfStream() is called.
>  Here are some questions to start the discussion:
>  1. What should the behavior be if there are gaps between the current
> playback position and the end of the buffered data when endOfStream() is
> called? For example, the current position is 10 and SourceBuffer.buffered
> reports [0, 15) and [20, 30).
>  1a. Should playback continue through this gap?
>  No, playback should stall at the gap.
>  1b. Should the duration be truncated to the range that contains the
> current position?
>  No. The state endOfStream() should be considered an indication of where
> the media of the stream ends, not an indication that nothing more will be
> appended - the missing data might get append (we allow out-of-order
> appends).
>  1c. Should an exception be thrown by endOfStream() if the current
> playback position isn't in the last range?
>  No.
>  1d. Should a "network error" be signalled when the current position
> reaches the end of the range? This is roughly equivalent to an
> HTMLMediaElement being unable to fetch data it needs.
>  No, just a normal playback stall, waiting for that missing data.
>  2. If multiple SourceBuffers are being used and they don't contain
> roughly the same buffered ranges when endOfStream() is called, then how
> should playback proceed? Philip give an example in the bug where we have a
> video SourceBuffer with the following ranges [0, 10], [20,30]  and an
> audio SourceBuffer with the range [0,15]
>  Playback of only one of the two media types can continue passed the
> endOfStream() point of one of them.
>  So, playback could continue with only video for [15,30] if endOfStream()
> was called in the state above, provided the video media for [15,30] is
> (becomes) available.
>  2a. Should playback continue to the highest buffered data
>  Yes.
>  and play through the gaps?
>  No.
>  2b. Should the buffered data be truncated to the intersection of the
> ranges?
>  No.
>  This is just the questions I can think of off the top of my head. These
> situations don't really happen in traditional HTMLMediaElement playback
> so I think we are blazing new trails here.
>  The corner case which is not addressed by a global endOfStream()
> (compared to per source buffer endOfStream()) is the following: suppose the
> audio stream ends at 15s and the video ends at 20s. Suppose I have audio
> for [0,15] and video for [0,18]. I can't yet call a global endOfStream()
> because I have not appended the latest bit of video data. Suppose now
> playback gets to 15s. Playback will stall, whereas it should continue
> through [15,18] with only the video.
>  If I had been able to call a audio endOfStream() the player would know
> there its no audio data after 15s and could continue playing through to
> 18s, but which time the remaining 2s of video might have arrived.
>  ůMark
>  Aaron
Received on Wednesday, 16 January 2013 16:52:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 16 January 2013 16:52:38 GMT